I recall purchasing (actually it was my parents doing the purchasing) back in the day for a dedicated 80387 chip. Mainly so our computer could run Falcon 3.0 in High-Fidelity mode. heh.
From the OS point of view, module, vs core, vs HT, it doesn't matter. The OS will see each bulldozer or HT core as a "single" core. For some of our HPC machines ($20K), we turn HT off because those extra "cores" confuse the benchmarking/load balancer software because half the cores "aren't real". Also the HT cores share the cache, so effectively jobs run with reduced cache or increased misses (see numastat). Turbo Boost makes it even harder. Have to shop around for the CPUs with the most non-HT cores, which can maintain the highest mulitipliers under full load. Multi-threaded isn't always better if it means slowing each core down by 200-600MHz. So many other things like L1 cache, memory bandwidth, ALUs, etc. that has a bigger impact in the real world than FPU.
If you depend on FPU a lot, you probably know enough about computer architecture to also know what kind of CPU resources you need for your worksets. The vast masses don't really need FPU and it's the easiest thing to share due to size and lack of need for majority of today's type of computing.