Forgot your password?
typodupeerror

Comment Re:Just one question... why? (Score 3) 119

A lot of people are assuming that multiple processors can be put on the same die for "equal or less cost". This simply isn't true.

Sharing the cache is hard

Cache is the vast majority of chip area in a modern processor; as others have pionted out, it's obvious that multiple processors should share a cache. However, this is difficult. The problem is that every load/store unit from every processor must share the same cache bandwidth.

Thus, for a 2-way chip with only a shared cache, memory latency---to the cache, the best possible case---is cut in half.

We can work around this by using various tricks up to and including multiported caches---but most of these tricks increase latency (lowering maximum clock speed) or require much more circuitry in the caches (we were sharing the cache because it was so big, remember?).

It makes much more sense to share the circuitry that feeds into the cache.

Those are the superscalar execution units! Thus, SMT.

Utilization

Instead of keeping half the execution units busy, we attempt to keep them all busy. Extrapolating very roughly from Figure 2 we can expect to issue about half as many instructions as we have issue slots (actually less if we have a lot of execution units). The basic idea is we can cut the number of empty issue slots in half each time we add a new thread. Further, instructions from separate threads do not need to be checked for resource overlaps---this circuitry is the main source of complexity in a modern processor.

What's happening now has been predicted for a long time. The extra resources (a bigger register set, TLB, extra fetch units) required for multithreading are now cheaper than the extra resources you'd need (mostly pipeline overlap logic) to get a similar increase in single-threaded performance.

SMT easier than SMP?

Moving thread parallelism into the processor is actually easier for the compiler and programmer; the weak memory models implied by cache coherence models aren't an issue when threads share exactly the same memory subsystem.

To get an idea for how hard it is to really understand weak memory models, consider Java (which actually tries to explain the problem to programmers---in every other language you're on your own). Numerous examples of code in the JDK and elsewhere contain an idiom---double-checked locking---which is wrong on weakly-ordered architectures. What's this mean? Your "portable" Java code will break mysteriously when you move it to a fast SMP. Alternatively, you will need to run your code in a special "cripple mode" which is extremely slow.

From a programmer's perspective, SMT (as opposed to SMP) architectures will be a godsend.

Slashdot Top Deals

MAC user's dynamic debugging list evaluator? Never heard of that.

Working...