Intel - Market Doesn't Need Eight Cores 548
PeterK writes "TG Daily has posted an interesting interview with Intel's top mobility executive David Perlmutter. While he sideswipes AMD very carefully ('I am not underestimating the competition, but..'), he shares some details about the successor of Core, which goes by the name 'Nehalem.' Especially interesting are his remarks about power consumption, which he believes will 'dramatically' decrease in the next years as well as the number of cores in processors: Two are enough for now, four will be mainstream in three years and eight is something the desktop market does not need." From the article: "Core scales and it will be scaling to the level we expect it to. That also applies to the upcoming generations - they all will come with the right scaling factors. But, of course, I would be lying if I said that it scales from here to eternity. In general, I believe that we will be able to do very well against what AMD will be able to do. I want everybody to go from a frequency world to a number-of-cores-world. But especially in the client space, we have to be very careful with overloading the market with a number of cores and see what is useful."
640K cores ought to be enough for anybody... (Score:3, Informative)
On the other hand, to be fair, the scaling issues start getting odd. I'd expect that we're going to have to move from a multi-core to a multi-"computer" model, where each set of, say, 4 cores works the way it does now, but each set of 4 gets its own memory and any other relevant pieces. (You can still share the video and audio, though at least initially there will presumably be a priviledged core set that gets set as the owner.)
Still, as my post title says, this does strike me as rather a 640KB-style pronouncement. (The original quote may be apocraphal, but the sentiment it describes has always been with us.)
Re:The desktop market is the largest market. (Score:3, Informative)
If they double the number of cores, I can only take advantage of that if I have a problem that can be parallelized and then if I work very very hard to multi-thread my project.
Re:Question. (Score:5, Informative)
So which bottleneck are you refering to? The new Core 2 Duo chips of Intel's share the L2 cache and, as far as I can tell from the reviews I have read, this setup works very well. Both chips can share data very quickly or when executing a single sequential program one of the cores can use all of the L2 cache (which in the Extreme Edition verion is up to 4MB!). Or are you refering to the main memory? It is possible for both cores to need to access the main memory at the same time, but modern pre-fetching and aggress speculation techniques reduce how often that occurs and the timing penalties when they do occur. And of course, the larger the L2 cache the more memory can be stored on the chip at once, reducing the need to access the main memory very often. According to Intel's own internal testing, they had a very hard time using all of the bandwidth the current front side bus and memory offers, which means the main memory shouldn't be a bottleneck.
So what is the bottleneck you are refering to?
Re:We've heard that before. (Score:3, Informative)
No, because 64-bit doesn't have the same kind of diminishing returns increasing the number of cores does. We don't need eight cores, at least in the short-to-medium term, because it would require fundamentally rewriting all our software to be more parallel (unlike 64-bit support, which only requires fixing code that assumes 4-byte pointers).
Re:640K cores ought to be enough for anybody... (Score:4, Informative)
That's called NUMA [wikipedia.org].
Do they not need it? Really? (Score:2, Informative)
Re:People Will Always "Need" More (Score:2, Informative)
Why the hell do people still bring this up? Gates never said this.
Do a Google search for Gates and 640K and be enlightened. Wired did an article about this bogus attribution and Wikipedia has an entry about it under Bill Gates.
Yes and no : depends on the brand (Score:5, Informative)
For Intel that's exactly the case :
With current intel architecture, memory is interfaced with the NorthBridge.
With multicore and multiproc systems, all chips communicate to the NorthBridge and get their memory access from there.
So more cores and processors means same pipe must be shared by more, and there for memory bandwith per core is lower.
Intel must modify their motherboard design. They must invent QUAD-channel memory bus, they must push newer and faster memory types (that's what hapenned with DDR-II ! They needed the faster datarates, even if those come at cost of latency), etc...
But the more their pursue in this direction, the more latency they add to the system. Which in the end will put them in a dead end. (Somewhat like the deeper pipe of their quest for Gigahertz put them in dead-end of burning-hot and power-hungry P4).
For AMD that's not quite the same :
With the architecture that AMD started with the AMD64 series, memory is directly interfaced with a memory controller that is on-die with the Chip.
The multiple procs and the rest of the mother board communicate using a standarized HyperTransport.
The rest of the mother board doesn't even know what's hapenning up there with the memory.
And with the advent of HyperTransport-plugs (HTX) the mother board doesn't even realy need to know it.
Riser cards with Memory-And-CPU-Both-of-Them (à la Slot 1) is possible (and highly anticipated, because it'll make possible a much wider possibility of specialized accelerators to be plugged than currently with AM2 socket)
The most widely publicised advantages of this structure are the lower latency.
But this also makes it easier to scale up memory bandwith : Just add another on-board memory controller and voilà you have dual-channel. That was the differences between first generations of entry-level AMD64 (Athlon 64 for 7## socket : one controller - single channel, Athlon FX for 9## socket : 2 controllers, dual channel).
by the time 8 cores processors come out and if CPU riser-board with standart HTX connector appears, nothing will prevent AMD to just build riser board designed for 8 cores chips with 4 memory controllers (and Quad-channel speed). Just change the riser board, memory speed will scale. Mother board doesn't need to be re-designed. In fact, same mother board could be kept.
And this won't come at the price of latency or whatever : the memory controller is ON the cpu die, and must not be shared with anything.
In fact, that's partially already happening :
In the case of multi procsystems, instead of all procs sharing the same pipe thru the NorthBridge, each chips has it's own controller going at full speed.
And this memory can be shared over the HT bus (albeit with some latency).
It's basically 4 memory controllers (2 per proc) working together. Acheiving quad-channel alike shouldn't be that difficult.
Specially when Intel is pushing the memory standart to chips with higher latency : asking for more bandwith in parallel over the HT-bus won't be that much penalizing.
So I think AMD will be faster at developping solutions to scale against higher number of cores than Intel, due to better architecture.
Maybe, it's not a coincidence that AMD is working on technology to "bind together" cores and present them as single proc to not-enough SMP-optimized software, and that at the same time Intel is telling who ever wants to listen to them that 4 cores is enough, 8 is too much. (Yeah, sure, just tell it to the database- and Sun Niagara people. Or even to older BeOS users. This just sounds like "640k is enough for everyone")
Re:We've heard that before. (Score:3, Informative)
No, in fact I mean CSP - see http://www.usingcsp.com/ [usingcsp.com]
Have a look at yaws: http://yaws.hyber.org/ [hyber.org] a high performance webserver written in Erlang.
Re:Do all cores have to be smart? (Score:4, Informative)
It's essentially how all modern processors are. I think the old coprocessors were the last that weren't on the same die (except the fake "coprocessors" that actually took over and completely ignored the old CPU, was more like a CPU upgrade in drag). Modern processors have a CISC instruction set which gets translated to a ton of mircoops (RISC) internally, and with parallel execution you in essence have multiple cores on one die - they're just not exposed to the user.
The limitation compared to a cell phone, which has an extremely fixed feature set is trying to find workable dedicated circuits for that are meaningful for a general purpose computer. That's essentially what the SSE[1-4] instruction sets are, dedicated encryption chips (on a few VIA boards, plus the new TCPA chips), dedicated video decoding circuitry (mostly found on GPUs) and maybe a few more. But on the whole, we've not found very many tasks that are of that nature.
In addition, there are many drawbacks. New formats keep popping up, and your old circuitry becomes meaningless or CPU technology speeds on and makes it redundant. The newest CPUs can so barely decode 1080p H.264/VC-1 content, but I expect that to be the hardest task any average desktop computer will face. What more is there a market for? I don't think too much.