I think he's referring to the hyperthreading technology itself, for which you probably can't set the HT bit unless you actually support it. Still, even though Phenoms don't have HT, they _will_ perform at closer to peak performance if you overcommit (in terms of threads). I've don some testing, and you need about 18 threads to truly saturate an X6, which is about the same number of threads that you need to saturate a dual Xeon (8 physical cores, 16 with HT).
I think you're a bit confused. This subthread is about Bulldozer and how its unusual design (where each pair of "cores" are not truly independent cores because they share a common floating point unit, instruction cache, decoder, and a couple other blocks) interacts with the Windows scheduler. Due to the superficial similarity to hyperthreading, some people maintain that if AMD had only been smart enough to make pairs of Bulldozer cores declare themselves to be one hyperthreaded core, it would have magically made Bulldozer much faster in Windows. This isn't really true, but fans looking for a reason to believe don't ever notice they're simultaneously claiming AMD was smart enough to design a great CPU and dumb enough to accidentally sabotage it in a really trivial, easy-to-fix way.
You are indeed right, I totally missed that part. And I don't know anything about the Windows scheduler, so I have no idea if advertising the core pairs would perform better if they looked like a single core with HT.
Also, your test was probably somewhat bogus. You can easily saturate a Phenom II X6 with six threads. I'd guess you ran a program where individual threads cannot individually saturate 1 CPU core. That is, they frequently go to sleep or wait on each other quite a lot. That's the only way you can continue to get significant scaling after N threads (where N equals the number of hardware threads available). Not all programs behave that way, so it's not real useful to report that one particular program happens to "scale" all the way up to 3 threads per core. (And if it does behave that way, there's no reason to believe it would behave any differently on Intel CPUs.)
The thing it, it does behave different on Intel CPUs. Tested on both a dual-Xeon (2x4 cores, doubled by HT) and on an i7 (4 cores, doubled by HT), and in both cases peak performance was achieved with a number of CPU threads matching (or very close to) the HT-advertised performance (more specifically, 18 threads in the Xeon case and 10 threads in the i7 case.
Of course this is just a very specific application, and I'm sure that the effects I'm seeing are influenced by bottlenecks from other subsystem (memory throughput, most likely); still, I find the difference between Intel and AMD CPUs is quite peculiar.
This is a quote from the January 26th 2012 by Tomi Ahonen
“Luckily I didn’t have to do the math for this, the nice people at All About Symbian had tracked the numbers (read through the comments) and calculated the limits, finding N9 sales to be between the level of 1.5 million and 2.0 million units in Q4. Wow! Nokia specifically excluded all of its richest and biggest traditional markets where it tried to sell the Lumia, and these countries achieved – lets call it the average, 1.75 million unit sales of the N9 in Q4. So the one N9 outsold both Lumia handsets by almost exactly 3 to 1.” [1]
And the amazing thing is that the N9 sold so incredibly well despite not being marketed as much as as the Lumia. I still come across people looking to buy an N9, and having to get it Switzerland because it's not sold in Italy.
Unfortunately nVidia cards are a bit better (support for PhysX) which AMD doesn't
Unless you really need PhysX (which is a niche feature), my opinion is that AMD video cards are better. The 7770 and 7870 have excellent price/performace ratios and no major weaknesses. In particular, thermals and power consumption are better than on corresponding nVidia cards.
You're right about AMD's uncompetitiveness against Intel in the CPU market, though.
AMD video cards are significantly better than NVIDIA ones when it comes to raw computation power and when it comes to performance/watt and when it comes to performance/price; especially now that the 7xxx series has overcome the only weakness of the old series, the VLIW instruction set and architecture. Where AMD sucks big times is in software support. NVIDIA has pushed immensely CUDA, to the point that people now think that GPGPU = CUDA; and it has immensely pushed in creating a software environment around CUDA, including tons of external libraries that depend on CUDA. AMD has lost of a lot of ground with their CTM -> CAL -> OpenCL transitions, that have effectively prevented their technology to gain any significant traction, and they are just now starting to go back and getting some visibility. Their APU offering is probably the last chance they get in doing a significant breakthrough. Let's hope they don't bust it.
Or use Windows or possibly Gnome...or do OpenCl or OpenGl programming...or-
The list goes on. The fact that people are still selling craptacular integrated video chipsets in this day and age saddens me greatly. Guys, it's 2012...pony up for a dedicated video card with dedicated video ram. Quit trying to save a buck or two on a component you really don't want to be cheap on.
Well, I think you can do OpenCL on Intel HD3xxx/4xxx chips these days.
AFAIK, Intel HD3xxx is not OpenCL capable, and Intel HD4xxx is officially supported by Intel on Windows only (no Linux drivers). This is in sharp contrast with AMD, which has much better OpenCL support for everything they ship (CPUs, GPUs and APUs).
*Looks around* AAAAAAAnd, how does this AVX-256 compare to OpenCl transcoding of video?
That's a stupid question. OpenCL by itself does nothing whatsoever to improve video transcoding. OpenCL is an API, so the performance of an OpenCL kernel for video transcoding highly depends on which hardware you're running it on. On Intel CPUs supporting AVX-256, OpenCL kernels will be compiled to use those instructions (if Intel keeps updating its OpenCL SDK), on GPUs and APUs it will use whatever the respective platforms use. What OpenCL does is make it easier to exploit AVX-256, just as it makes it easier to exploit SSE and multiple cores.
Well, the linux market share isn't yet growing
Actually, it is. Slowly but surely. Unsurprisingly, one of the biggest obstacle to Linux adoption for younger people is exactly gaming. I know quite a few peopl whose systems are dual-boot between linux and windows specifically for this: they use Linux most of the time, and then switch to Windows to play.
However I live 30 miles from my house. Man, that must be annoying.
:-)
That's divorce. The wife got the house. He got the restraining order.
And it's the kind of restraining order that forces you to be within a certain distance too http://xkcd.com/415/
I've noticed several design suggestions in your code.