Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
IBM

Design Philosophy of the IBM PowerPC 970 232

D.J. Hodge writes "Ars Technica has a very detailed article on the PowerPC 970 up that places the CPU in relation to other desktop CPU offerings, including the G4 and the P4. I think this gets at what IBM is doing: 'If the P4 takes a narrow and deep approach to performance and the G4e takes a wide and shallow approach, the 970's approach could be characterized as wide and deep. In other words, the 970 wants to have it both ways: an extremely wide execution core and a 16-stage (integer) pipeline that, while not as deep as the P4's, is nonetheless built for speed.'"
This discussion has been archived. No new comments can be posted.

Design Philosophy of the IBM PowerPC 970

Comments Filter:
  • Re:What is: 2H03? (Score:2, Informative)

    by ajakk ( 29927 ) on Monday October 28, 2002 @05:33PM (#4551097) Homepage
    Second half of 2003.

  • Re:What is: 2H03? (Score:3, Informative)

    by FreeLinux ( 555387 ) on Monday October 28, 2002 @05:35PM (#4551112)
    Second half 2003. Which almost always slips so the real meaning is assumed to be Q403(4th quarter 2003) or even Q104(First quarter 2004).

  • by PhysicsScholar ( 617526 ) on Monday October 28, 2002 @05:35PM (#4551116) Homepage Journal
    Some of you may have read an extremely wide execution core and a 16-stage (integer) pipeline in this article's write-up and been extremely confused. I took a few computer architecture courses back in my undergrad days, so I can refresh some of your memories as well as teach basic processor design to those of you who never got to attend a 4-year college and study computer chips in-depth.

    Basically, all modern processors are pipelined. This means that they execute various instructions at the same time. Whereas doing a load of wash, waiting for it to finish, putting it into the dryer, waiting to finish, and then folding would take 30 minutes * 3 steps * 3 loads = 4.5 hours, one could PIPELINE such a process, thus removing sequentialism and doing the first load, then while that's drying put the second load into the washer, and so on ... this takes a much shorter amount of time.

    This is all a processor really does. It does a FETCH, an INSTRUCTION DECODE, then an EXECUTION, then perhaps a MEMORY READ/WRITE, and then a WRITE BACK, perhaps. So this 16-stage pipeline can have 16 different instructions executed all at the same time, but just in different points of its execution. The example in CAPS above is a 5-stage pipeline that's similar to those in MIPS processors.

    Hope this was helpful!
  • Re:Question (Score:3, Informative)

    by Faggot ( 614416 ) <choadsNO@SPAMgay.com> on Monday October 28, 2002 @05:38PM (#4551135) Homepage
    The PowerPC 970's design is adapted from IBM's successful Power4 server processor. Physically smaller, the PowerPC 970 sacrifices some execution units -- including the Power4's second processor core -- for 64-bit compatibility and the SIMD unit.

    While the Power4 core has two processor cores and massive caches for MP implementations, the PowerPC 970 has only one processor core, an SIMD unit and a 512K on-die L2 cache. The cache includes error correction. The PowerPC 970, as described today, has no connectors for an L3 cache.

  • Re:Question (Score:5, Informative)

    by mfago ( 514801 ) on Monday October 28, 2002 @05:47PM (#4551222)
    the PowerPC 970 sacrifices some execution units -- including the Power4's second processor core -- for 64-bit compatibility and the SIMD unit.

    This implies that the Power4 is not 64 bit -- which is of course wrong.

    I would say that the PowerPC 970 trades the second core and fancier interconnects of the Power4 for lower power, cost, and the SIMD unit.
  • built for speed (Score:2, Informative)

    by selderrr ( 523988 ) on Monday October 28, 2002 @05:48PM (#4551226) Journal
    sigh

    A deep pipeline has as much to do with speed as the number of characters in the processors name. Deep pipelines allow for higher MHz. That's all there's about it. Granted, for two processors of the same architacture, the one with higher MHz is faster. But you can't claim a CPU to be built for speed solely based on it's pipeline depth.

    UUUUUUUUUH I hate explaining this shit, but my god /. editors have such brainfarts lately I just had to say something.
  • by Paul Neubauer ( 86753 ) on Monday October 28, 2002 @05:56PM (#4551286)
    One of the early planned PPC chips had that idea in mind, by pretty much adding an x86 processor and logic to figure out what instruction set it was getting and where to send it. The idea was that then there would be no barrier to using PPC code - it could run x86 code to replace existing systems and run PPC as well. Thus it would be a transitioning thing.

    The x86 world seemed to move faster than the design for this and it fell away. It made more sense to concentrate on PPC stuff rather than try to do PPC and changing x86 stuff. Also, if it ran x86, why should anyone bother to write for PPC?

    The difference is Pentiums and Athlons are intended to be x86 family upgrades, while the PPC is not. The PPC 970 is meant as an upgrade to earlier PPCs. One could as well ask why AMD doesn't make an Athlon that can run PPC code.
  • by flux4 ( 157463 ) on Monday October 28, 2002 @05:59PM (#4551303) Homepage
    David Every has a great article [igeek.com] about 64-bit processors (referencing the IBM 970, although not by name) over at iGeek.com. Includes an interesting look back at Motorola vs. Intel over the years, and talks about how 64-bit addressing can reference about 18.5 quintillion (18,500,000,000,000,000,000) memory bits, for 16 Exobytes of memory. That should do us for a good while.
  • by Coz ( 178857 ) on Monday October 28, 2002 @06:12PM (#4551403) Homepage Journal
    The article points out that a comparison with the AMD chip would be appropriate, but it's not "out there" right now as a basis for comparison. Hannibal says he'll probably use the 970 as a reference when he gets hold of the Opteron and does his down-in-the-registers review of it.
  • Re:Question (Score:2, Informative)

    by WatertonMan ( 550706 ) on Monday October 28, 2002 @06:13PM (#4551413)
    As for why Apple wouldn't go for the Power4, it is because it simply is too expensive. Further Apple wants something like the Altivect unit that is in current G4s. Power4 is simply optimized for non-desktop uses and is overkill for what Apple needs. The 970 is a nice balance between Apple's needs and the Power core. Further by moving to IBM Apple is able to get a far better provider than Motorola whose G5 has been missing in action for some time now.
  • by NattyDread ( 192484 ) on Monday October 28, 2002 @06:17PM (#4551433)
    The other responses to your question have pretty much hit it dead-on. I just wanted to comment that the PowerPC has always been the little brother of the Power architecture used originally in the RS6000 ... and now in almost everything IBM makes - AS400, E9000, etc.

    The first generations (601, 603/604 and the ?aborted? 620) of the PowerPC line were scaled-back versions of the Power and Power2 architectures respectively [the original Power architecture was mounted on a 3x5 daughter card with 4-5 separate chips [I'll have to go looking for my tech papers] making-up the core ... because of this the migration of everything into one die for the PowerPC was amazing.

    Additionally, IBM has tended to work-out new capabilities -- such as the move to 64-bit and dual cores -- on the larger scale Power architecture, before attempting to stuff it into the smaller PowerPC pacakge [besides, IBM has to keep something to distinguish its pricier iron from the OEMs. ;)

    Natty

  • Re:built for speed (Score:5, Informative)

    by fobef ( 541536 ) on Monday October 28, 2002 @06:18PM (#4551446) Homepage
    Actually there is quite a lot of research on what the optimal pipeline depth is for a processor, and for an x86 manufactured with a modern process, the conclusion seems to be that a pipeline of between 40 and 50 stages would be the optimal, given that enough resources could be devoted to the design of it.

    If you're interested, take a look at the following documents (you might wanna check the urls for spaces):

    http://systems.cs.colorado.edu/ISCA2002/FinalPap er s/Deep%20Pipes.pdf

    http://systems.cs.colorado.edu/ISCA2002/FinalPap er s/hartsteina_optimum_pipeline_color.pdf

    http://systems.cs.colorado.edu/ISCA2002/FinalPap er s/hrishikeshm_optimal_revised.ps
  • Re:Question (Score:4, Informative)

    by Coz ( 178857 ) on Monday October 28, 2002 @06:19PM (#4551453) Homepage Journal
    Another reason (in addition to the excellent ones other folks have listed) - cost. Power4 chips are over-engineered, compared with "consumer" CPUs like the G4, P4, and 970. Hannibal's article mentions that at the same clock speed, some instructions execute faster on the 970 simple because of the thickness of the oxide layers used in the transistor gates. It's a different emphasis - high reliability and expense versus "less" (still acceptable to 80% of the world) reliability and acceptable mass-production cost-per-chip.


    Plus, the Power4 is really designed as a server/Big Iron chip - it's really 2 CPUs on 1 die - and that's just not what an iMac needs.

  • by Drakonian ( 518722 ) on Monday October 28, 2002 @06:25PM (#4551493) Homepage
    PowerPC hasn't been just for Apple for quite some time, as evidenced by the 3930 hits for "Embedded PowerPC" on Google or the Embedded PowerPC Resources and Information [go-ecs.com] page.
  • Re:Question (Score:5, Informative)

    by mfago ( 514801 ) on Monday October 28, 2002 @06:46PM (#4551694)
    The reason that Apple won't use the Power4:

    It is HUGE [com.com].

    The picture at the top right shows the Power4 multichip module as used in the p690. Yes, it is the 5" square thing in the guys hand.

    There are better pictures of the MCM itself, but I couldn't find the close-up showing just the MCM in someone's hand.

    The large size (along with everything it entails: it uses 125W power, and supposedly costs about $3500 to manufacture) is one indication that IBM designed the Power4 for its big-iron. Nevermind that IBM does offer the Power4 (sans MCM) in some of their smaller servers.

    The PowerPC970 is the equivalent processor tweaked for the desktop/low-end servers.
  • Re:G5? (Score:3, Informative)

    by Sycraft-fu ( 314770 ) on Monday October 28, 2002 @06:52PM (#4551737)
    Go look on their site, they have specs on the first of their G5 chips:

    http://e-www.motorola.com/webapp/sps/site/prod_s um mary.jsp?code=MPC8540&nodeId=01M98655
  • Re:Question (Score:4, Informative)

    by Billly Gates ( 198444 ) on Monday October 28, 2002 @07:12PM (#4551875) Journal
    As evident in the article, the transistors and logic gates in the power4 are biger and alot more expensive to produce to increase reliability. This is not needed in a desktop and it slows down performance since the gates can't switch as fast. I find this claim hard to believe since any freebsd and linux (2.2 and earlier kernels) can run for months and years without a single reboot.

    The power4 costs 4-5k per cpu. Obviously too expensive for desktop systems because of high end server features and very large caches in the chip that will offer no performance benefit to desktop apps. Only heavily threaded multitasking apps running in parrallel will see the performance improvments by a power4. A web server running servlets and databases are the examples I refer to as heavily threaded multitasking applications. Adobe photoshop will show little performance difference and may even run slower on a power4 vs a powerpc 970 due to the lack of simd instructions.

    IBM did good with this processor and its leaps and bounds ahead of the g4. The main limitation of the g4 is the lack of ddr memory support. In ddr macs the chipset has to slow down memory access to the cpu to 133mhz speeds and it creates a very serious bottleneck. This alone is bottlneckintgthe processor down to half its potential in +1 ghz processors. Expect a %200-300 performance increase with these new processors.

  • by Hannibal_Ars ( 227413 ) on Monday October 28, 2002 @07:58PM (#4552178) Homepage
    it operates very much like itanium, w.r.t to group bundling / dispatch of IOPs. Very much like itaniums 3-bundle EPIC codes, but itanium requires the compiler to best pack the templates, whereas the 970 builds each bundle based on dependencies. funny how they both punt with nops.


    This is a common misconception, probably stemming from some early coverage of the Power4 by Keith Dieffendorff(sp?) where he for some unknown reason called the Power4's instruction bundles "VLIW-like". The problem is that the bundles are strictly in program order. There is no dependency checking or code scheduling that goes on in the building of these bundles. They're built along completely different rules than bundles in a VLIW machine.

    All the out-of-order stuff happens in the back end, in the scheduling queues, just like on any other non-VLIW processor.
  • Re:Question (Score:2, Informative)

    by Zueski ( 72292 ) on Monday October 28, 2002 @08:22PM (#4552320)
    Actually, its based on the Book-E standard, same as the 601, 603, 604, G3, G4, etc. So, assuming all the execution units are present (i.e., AltiVec), it should run the same code. Book-E is a 64 bit archetechure that can be implemented as a 32 bit version.
  • very deep pipelines (Score:3, Informative)

    by Christopher Thomas ( 11717 ) on Monday October 28, 2002 @08:31PM (#4552371)
    Actually there is quite a lot of research on what the optimal pipeline depth is for a processor, and for an x86 manufactured with a modern process, the conclusion seems to be that a pipeline of between 40 and 50 stages would be the optimal, given that enough resources could be devoted to the design of it.

    If you're interested, take a look at the following documents (you might wanna check the urls for spaces):


    What makes me leery of taking these results at face value is that the performance peak is very broad (i.e. incremental benefit is low beyond a certain point), while the first paper, at least, seemed to gloss over a few important concerns (keeping clock skew and jitter very low when distributing to that many more stages, increasing overhead from the bypassing network, etc).

    Still an interesting set of papers, but I'm not (yet) convinced.
  • Re:All this talk... (Score:5, Informative)

    by Roadmaster ( 96317 ) on Monday October 28, 2002 @08:56PM (#4552503) Homepage Journal
    " OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs." All that would have to be added to gcc are the "compiler directives", as the "library routines" and "environment variables" aren't directly a part of the compiler.

    Now, openMP is good for programming extremely high-performance shared-memory applications, like scientific computation applications and stuff like that. It really sounds like overkill for a desktop environment where it's probably easier to program a multithreaded application with standard IPC mechanisms where communication is required. And really high-performance applications could also be programmed using MPI and a message passing communication scheme, which is far more widely used (compare the # of people who know about openmp versus those who know about mpi), probably wouldn't be much less efficient, and would quite likely scale much better than a shared memory implementation.
  • Re:Question (Score:1, Informative)

    by Anonymous Coward on Monday October 28, 2002 @10:13PM (#4552877)
    If you had read the article you had pointed out
    you would have found out about the size and
    other things Quote from article
    "
    Each Power4 chip, with two CPUs, is packaged on a 4-inch-square ceramic multi-chip module with three other Power4 brethren. The multi-chip module, a 60-layer ceramic package, is filled with wiring to join the chips to neighbors, to 128MB modules of cache memory, and to a high-speed switch to reach the more distant CPUs.
    "
  • by Anonymous Coward on Monday October 28, 2002 @10:23PM (#4552925)
    40 bit address bus. Not a 40 bit data bus. BIG difference there.

    And the 40 bit address bus is most likely a pin packaging limitation. They did not see a need to bring those extra 24 address lines out to the chip package. Internally, it is 64 bit. Much like the venerable MC68000 was 24 bit externally, but 32 bit internally.

    But seriously, in the life-span of THIS processor implementation - do you seriously see ANY desktop manufacturer even thinking about putting that much RAM in their CPUs?? Heck 1GB of RAM is not 'standard' yet. Extrapolating w/ Moore's Law, we'll be approaching 40bits in 8 years. Apple will undoubtedly have another chip before then!

    If you truly need THAT much physical storage today, you'll need to shell out for a SERIOUSLY large server. IBM's high-end p690 currently maxes out at 256GB. The virtual address space is undoubtedly much higher.

    Tom

Today is a good day for information-gathering. Read someone else's mail file.

Working...