Not everyone runs workloads that are poorly vectorized and parallelized, you insensitive clod.
Very few consumers actually run workloads that are properly vectorized and parallelized. Explicit vectorization requires manual creation of multiple codepaths based on the level of vector support in the hardware. Autovectorization is substantially more flexible but requires a decent compiler to pick it up; ICC is by far the best for this, and GCC still struggles after many years of development. Furthermore, the proliferation of virtual-machine based languages means that consumer application developers have largely absolved themselves responsibility for writing code that is vectorized much less suitable for auto vectorization by a JIT that actually does so. Heck, SIMD support in Javascript is just beginning to materialize.
Hyperthreading (Intel's implementation of SMT) is what gives Intel's i7 series microprocessors a huge advantage
The P4 had hyperthreading too. If that really would be such a huge advantage, one would think it would have been a bit more competitive than it was...
Netburst had a ton of issues with it that crippled performance across the board. The HT design was also rather immature. The implementation of HT in later releases of the Itanium series and Nehalem were vastly improved.
Disabling one of the CMT frontends...
...assuming the workload is not keeping all the frontends busy most of the time.
There are only a handful of common consumer applications that keep 6 or even 8 frontends busy at all times. AMD's FX series microarchitectures tends to keep up with Intel Core microarchitectures in such applications, yet fall behind in the ones that consumers spend most of their time running. Javascript, the language that powers the web for some strange reason, is inherently single-threaded.
...only reduces competition for resources that are shared, which on AMD FX series microprocessors includes some of the cache and floating point hardware.
Not with AVX-intensive workloads; there, a single thread can keep the whole shared FPU busy with AVX instructions.
That's correct. The vector unit in both AMD's FX series and Intel's Core series microprocessors are shared between two front ends. although on Intel's architecture the add and multiply vector EUs are on separate ports and can accept issues from separate threads in the same cycle (albeit in lieu of two scalar arithmetic instructions), I'm not sure if AMD's architecture works the same way (although I think that the instruction latency is longer). What I was discussing is that under AMD's CMT design the architecture is unable issue instructions to ALUs on the module's paired core, whereas SMT allows this by virtue of having a completely common backend with a unified reservation station. If one of the frontends on an AMD FX series microprocessor is disabled, the two ALUs are disabled along with it and the result is a typical 4-way SMP with 2 ALUs per logical processor. If Hyperthreading is disabled, the result is a 4-way SMP with 4 ALUs per logical processor as ALUs can still be issued instructions from the unified reservation station. SMT allows for flexibility that simply doesn't exist under CMT.
CMT is inherently less efficient than SMT. It's also a simpler design that's easier for a smaller company to implement.
{citation needed} on both accounts.
There are piles upon piles of benchmarks out there demonstrating this. Intel's architecture excels in instruction throughput, transistor budget, and power efficiency.
Look at the price of AMD's microprocessors on any online retailer's website. Intel's i7-3930k still sells for around $600 and its successor is around $630. AMD's flagship FX-9590 fell from $1000... to $600... to $300 in a matter of weeks as it just can't keep up where it counts.