I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
I think you're a bit confused. This subthread is about Bulldozer and how its unusual design (where each pair of "cores" are not truly independent cores because they share a common floating point unit, instruction cache, decoder, and a couple other blocks) interacts with the Windows scheduler. Due to the superficial similarity to hyperthreading, some people maintain that if AMD had only been smart enough to make pairs of Bulldozer cores declare themselves to be one hyperthreaded core, it would have magically made Bulldozer much faster in Windows. This isn't really true, but fans looking for a reason to believe don't ever notice they're simultaneously claiming AMD was smart enough to design a great CPU and dumb enough to accidentally sabotage it in a really trivial, easy-to-fix way.
You are indeed right, I totally missed that part. And I don't know anything about the Windows scheduler, so I have no idea if advertising the core pairs would perform better if they looked like a single core with HT.
Also, your test was probably somewhat bogus. You can easily saturate a Phenom II X6 with six threads. I'd guess you ran a program where individual threads cannot individually saturate 1 CPU core. That is, they frequently go to sleep or wait on each other quite a lot. That's the only way you can continue to get significant scaling after N threads (where N equals the number of hardware threads available). Not all programs behave that way, so it's not real useful to report that one particular program happens to "scale" all the way up to 3 threads per core. (And if it does behave that way, there's no reason to believe it would behave any differently on Intel CPUs.)
The thing it, it does behave different on Intel CPUs. Tested on both a dual-Xeon (2x4 cores, doubled by HT) and on an i7 (4 cores, doubled by HT), and in both cases peak performance was achieved with a number of CPU threads matching (or very close to) the HT-advertised performance (more specifically, 18 threads in the Xeon case and 10 threads in the i7 case.
Of course this is just a very specific application, and I'm sure that the effects I'm seeing are influenced by bottlenecks from other subsystem (memory throughput, most likely); still, I find the difference between Intel and AMD CPUs is quite peculiar.