Design Philosophy of the IBM PowerPC 970 232
D.J. Hodge writes "Ars Technica has a very detailed article on the PowerPC 970 up that places the CPU in relation to other desktop CPU offerings, including the G4 and the P4. I think this gets at what IBM is doing: 'If the P4 takes a narrow and deep approach to performance and the G4e takes a wide and shallow approach, the 970's approach could be characterized as wide and deep. In other words, the 970 wants to have it both ways: an extremely wide execution core and a 16-stage (integer) pipeline that, while not as deep as the P4's, is nonetheless built for speed.'"
Re:What is: 2H03? (Score:2, Informative)
Re:What is: 2H03? (Score:3, Informative)
An overview of pipelining (Score:3, Informative)
Basically, all modern processors are pipelined. This means that they execute various instructions at the same time. Whereas doing a load of wash, waiting for it to finish, putting it into the dryer, waiting to finish, and then folding would take 30 minutes * 3 steps * 3 loads = 4.5 hours, one could PIPELINE such a process, thus removing sequentialism and doing the first load, then while that's drying put the second load into the washer, and so on
This is all a processor really does. It does a FETCH, an INSTRUCTION DECODE, then an EXECUTION, then perhaps a MEMORY READ/WRITE, and then a WRITE BACK, perhaps. So this 16-stage pipeline can have 16 different instructions executed all at the same time, but just in different points of its execution. The example in CAPS above is a 5-stage pipeline that's similar to those in MIPS processors.
Hope this was helpful!
Re:Question (Score:3, Informative)
While the Power4 core has two processor cores and massive caches for MP implementations, the PowerPC 970 has only one processor core, an SIMD unit and a 512K on-die L2 cache. The cache includes error correction. The PowerPC 970, as described today, has no connectors for an L3 cache.
Re:Question (Score:5, Informative)
This implies that the Power4 is not 64 bit -- which is of course wrong.
I would say that the PowerPC 970 trades the second core and fancier interconnects of the Power4 for lower power, cost, and the SIMD unit.
built for speed (Score:2, Informative)
A deep pipeline has as much to do with speed as the number of characters in the processors name. Deep pipelines allow for higher MHz. That's all there's about it. Granted, for two processors of the same architacture, the one with higher MHz is faster. But you can't claim a CPU to be built for speed solely based on it's pipeline depth.
UUUUUUUUUH I hate explaining this shit, but my god
Re:They should make it work three ways (Score:3, Informative)
The x86 world seemed to move faster than the design for this and it fell away. It made more sense to concentrate on PPC stuff rather than try to do PPC and changing x86 stuff. Also, if it ran x86, why should anyone bother to write for PPC?
The difference is Pentiums and Athlons are intended to be x86 family upgrades, while the PPC is not. The PPC 970 is meant as an upgrade to earlier PPCs. One could as well ask why AMD doesn't make an Athlon that can run PPC code.
Understanding 64-Bit Processing (Score:4, Informative)
Re:Comparison without AMD? (Score:4, Informative)
Re:Question (Score:2, Informative)
Re:Power4 vs PowerPC 970 (Score:5, Informative)
The first generations (601, 603/604 and the ?aborted? 620) of the PowerPC line were scaled-back versions of the Power and Power2 architectures respectively [the original Power architecture was mounted on a 3x5 daughter card with 4-5 separate chips [I'll have to go looking for my tech papers] making-up the core
Additionally, IBM has tended to work-out new capabilities -- such as the move to 64-bit and dual cores -- on the larger scale Power architecture, before attempting to stuff it into the smaller PowerPC pacakge [besides, IBM has to keep something to distinguish its pricier iron from the OEMs.
Natty
Re:built for speed (Score:5, Informative)
If you're interested, take a look at the following documents (you might wanna check the urls for spaces):
http://systems.cs.colorado.edu/ISCA2002/FinalPa
http://systems.cs.colorado.edu/ISCA2002/FinalPa
http://systems.cs.colorado.edu/ISCA2002/FinalPa
Re:Question (Score:4, Informative)
Plus, the Power4 is really designed as a server/Big Iron chip - it's really 2 CPUs on 1 die - and that's just not what an iMac needs.
Re:PPC, not just for Apple any more (Score:3, Informative)
Re:Question (Score:5, Informative)
It is HUGE [com.com].
The picture at the top right shows the Power4 multichip module as used in the p690. Yes, it is the 5" square thing in the guys hand.
There are better pictures of the MCM itself, but I couldn't find the close-up showing just the MCM in someone's hand.
The large size (along with everything it entails: it uses 125W power, and supposedly costs about $3500 to manufacture) is one indication that IBM designed the Power4 for its big-iron. Nevermind that IBM does offer the Power4 (sans MCM) in some of their smaller servers.
The PowerPC970 is the equivalent processor tweaked for the desktop/low-end servers.
Re:G5? (Score:3, Informative)
http://e-www.motorola.com/webapp/sps/site/prod_
Re:Question (Score:4, Informative)
The power4 costs 4-5k per cpu. Obviously too expensive for desktop systems because of high end server features and very large caches in the chip that will offer no performance benefit to desktop apps. Only heavily threaded multitasking apps running in parrallel will see the performance improvments by a power4. A web server running servlets and databases are the examples I refer to as heavily threaded multitasking applications. Adobe photoshop will show little performance difference and may even run slower on a power4 vs a powerpc 970 due to the lack of simd instructions.
IBM did good with this processor and its leaps and bounds ahead of the g4. The main limitation of the g4 is the lack of ddr memory support. In ddr macs the chipset has to slow down memory access to the cpu to 133mhz speeds and it creates a very serious bottleneck. This alone is bottlneckintgthe processor down to half its potential in +1 ghz processors. Expect a %200-300 performance increase with these new processors.
Re:wide / transistors (Score:5, Informative)
This is a common misconception, probably stemming from some early coverage of the Power4 by Keith Dieffendorff(sp?) where he for some unknown reason called the Power4's instruction bundles "VLIW-like". The problem is that the bundles are strictly in program order. There is no dependency checking or code scheduling that goes on in the building of these bundles. They're built along completely different rules than bundles in a VLIW machine.
All the out-of-order stuff happens in the back end, in the scheduling queues, just like on any other non-VLIW processor.
Re:Question (Score:2, Informative)
very deep pipelines (Score:3, Informative)
If you're interested, take a look at the following documents (you might wanna check the urls for spaces):
What makes me leery of taking these results at face value is that the performance peak is very broad (i.e. incremental benefit is low beyond a certain point), while the first paper, at least, seemed to gloss over a few important concerns (keeping clock skew and jitter very low when distributing to that many more stages, increasing overhead from the bypassing network, etc).
Still an interesting set of papers, but I'm not (yet) convinced.
Re:All this talk... (Score:5, Informative)
Now, openMP is good for programming extremely high-performance shared-memory applications, like scientific computation applications and stuff like that. It really sounds like overkill for a desktop environment where it's probably easier to program a multithreaded application with standard IPC mechanisms where communication is required. And really high-performance applications could also be programmed using MPI and a message passing communication scheme, which is far more widely used (compare the # of people who know about openmp versus those who know about mpi), probably wouldn't be much less efficient, and would quite likely scale much better than a shared memory implementation.
Re:Question (Score:1, Informative)
you would have found out about the size and
other things Quote from article
"
Each Power4 chip, with two CPUs, is packaged on a 4-inch-square ceramic multi-chip module with three other Power4 brethren. The multi-chip module, a 60-layer ceramic package, is filled with wiring to join the chips to neighbors, to 128MB modules of cache memory, and to a high-speed switch to reach the more distant CPUs.
"
Re:Question IT IS ONLY 40 BITS not 64. (Score:4, Informative)
And the 40 bit address bus is most likely a pin packaging limitation. They did not see a need to bring those extra 24 address lines out to the chip package. Internally, it is 64 bit. Much like the venerable MC68000 was 24 bit externally, but 32 bit internally.
But seriously, in the life-span of THIS processor implementation - do you seriously see ANY desktop manufacturer even thinking about putting that much RAM in their CPUs?? Heck 1GB of RAM is not 'standard' yet. Extrapolating w/ Moore's Law, we'll be approaching 40bits in 8 years. Apple will undoubtedly have another chip before then!
If you truly need THAT much physical storage today, you'll need to shell out for a SERIOUSLY large server. IBM's high-end p690 currently maxes out at 256GB. The virtual address space is undoubtedly much higher.
Tom