Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Intel Hardware

Intel to Increase Stages in Prescott 524

Alizarin Erythrosin writes "Further contributing to the MHz Myth, The Register and ZDNet are reporting that the new P4 core, codenamed Prescott, will have a longer pipeline then Northwood. No official numbers have been released, but The Reg is saying an Intel spokesman said that 30 stages seems to be a reasonable estimate. As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls. 'And just as the PIII proved faster than the early P4s in some applications, it's likely that Northwood will similarly prove faster than Prescott, which has clearly been designed for speeds of the order of 4GHz.'"
This discussion has been archived. No new comments can be posted.

Intel to Increase Stages in Prescott

Comments Filter:
  • by Breakfast Pants ( 323698 ) on Thursday January 22, 2004 @09:26PM (#8061891) Journal
    With all these pipelines you'd think intel was Bush and Prescott was Afghanistan.
  • by ObviousGuy ( 578567 ) <> on Thursday January 22, 2004 @09:26PM (#8061893) Homepage Journal
    Northwood was really unsatisfying. I found that for the money, it was too short with too few stages. While gameplay was fine, the lack of stages simply made the cost not worth it for me.

    2 stars.
  • by odeee ( 741339 ) on Thursday January 22, 2004 @09:28PM (#8061907)
    It's not the size of your pipeline that counts... its how you use it.
    • I hear Prescott packs quite a punch.
    • Re:Size of pipeline (Score:5, Interesting)

      by Hoser McMoose ( 202552 ) on Friday January 23, 2004 @03:13AM (#8063871)
      Ironically enough, that's quite accurate for processors!

      A 6-stage pipeline with terrible branch prediction and all sorts of holes in it isn't going to do any good at all, while a 30 stage pipeline with great branch prediction (and the P4 does have great branch prediction) and few bubbles or holes (improved SMT, aka hyperthreading, is supposed to help here) will do wonders.

      Of course, the real question is now how long the total pipeline is, but the branch mispredict penalty. It should be noted that the "Northwood" P4 has a 28-stage pipeline, but only a 20-stage mispredict penalty. If the "Prescott" has a 30-stage pipeline with a 22-stage mispredict penalty, it isn't exactly a huge change.
  • by ghostis ( 165022 ) on Thursday January 22, 2004 @09:29PM (#8061919) Homepage
    I work at an engineering firm. The deep pipelines in the current P4 perform so poorly with general number crunching (e.g. matlab) we have almost completely switched to Athlons and are seriously considering Opteron.

    • If you care about performance at all, why on earth are you using matlab?
      • by Goonie ( 8651 ) * <robert,merkel&benambra,org> on Thursday January 22, 2004 @10:01PM (#8062195) Homepage
        In some situations, this kind of number-crunching is done with a custom program that is only run a few times. In such situations hacking something together in Matlab is quicker to get up and running than a full-blown C++ or, god forbid, FORTRAN program.

        Programmer time is much more expensive than faster machines.

      • by EulerX07 ( 314098 ) on Thursday January 22, 2004 @10:04PM (#8062213)
        Matlab can hardly be beat in speed when you need to produce custom software to crunch huges matrices full of number. You can have a GUI designed, working, put some code quickly together that can grab data from any txt format, run mathematical formulas on those data. Then you can do any operations you want on the matrices that are in memory and easily accessible. Want to throw your data into a chart? A few minutes of coding and you've got the perfect chart on there.

        Back in my days of internship at the canadian space agency, I'd program multiple custom apps to pre-process the data before it being fed to the mainframes of a contractor for finite element analysis. Matlab is the tool to use for anybody involved in scientific projects. Yes, your code in C will run much faster, but it'll take significantly longer to get it up and running.

        If you run a lot of loops and it's really bogging the performance down, you can program just those sections of code in C and compile with matlab libraries to be able to use it in Matlab like the native commands. I did one piece of code that took a finite element file and created the 3d model in matlab. Took 20 minutes to run the code in matlab, 3.45 seconds once I had compiled the tough part of the code in C.

        In the end it's all about using the right tool, and for engineering/matlab, Matlab is excellent.
        • Each to their own I suppose. I admit I don't have much experience with Matlab (I'm planning on keeping it that way). As a college project, we were told to use matlab for a computer vision task. I tried everything to optimise it, followed all the guidelines on vectorising code and not using loops, and eventually found that the only way to do it was to write the critical code in C, as you suggest (this improved the speed by a factor of 100). In the end, there was almost no advantage from having used matlab an
          • by Latent Heat ( 558884 ) on Friday January 23, 2004 @12:33AM (#8063018)
            Matlab is to the academic-scientific-engineering world what Visual Basic is to the accounting-business-data processing world.

            Your EE or ME or ChemE full professor as a grad student could have written a FORTRAN program to compute some stuff and write output to a numeric text file or perhaps draw some plots using a subroutine library. You are probably thinking that anyone who can't sling together C programs using VI to draw graphics straight to X is a luser, but I am talking about pretty technically savy people who don't have time to spend on this stuff and who employ armies of Engineering majors from foreign lands who are not up on this stuff either.

            My own take is that if a particular numerical calculation can be easily programmed by some package, it must not be on the cutting edge of research because someone has already done it. Besides, if your software package is really deep, most of the effort goes into the architecture and the data flows and into graphics, and the RAD bit is only simplifying a tiny part of what you are spending your time. A high-power scientific data visualization is really a video game, and how many video games are implemented in Matlab?

            But what Perl is to text processing, Python is to collections, and VB is to slinging together a GUI, Matlab is to numerics (what used to be FORTRAN libraries) -- it may not have the best algorithms, but it has a lot of algorithms -- it has a semi-decent scripting language, and it has some facility with producing plots from your computations and other data.

            Now that's the thing -- if you are doing matrix operations or using some canned function (most likely C under the hood), Matlab is as fast as fast can be. The minute you start looping in Matlab, it is interpreted and the speeds are in the Python range.

            Before you knock it completely, it has very good integration with Java modules -- more seamless than with C modules. While Java may be pokey for its GUI, for tight numeric loops the JIT is almost as fast as C -- no joke, a person should consider writing numeric extensions to Matlab in Java of all things, especially on Windows where they tweaked up Java 1.4.2_03. And how many scripting languages (OK, Jython) have this level of Java integration?

            But as a scripting language, Matlab has its shortcomings. It started out as a matrix calculator and has had features grafted on in a hodge-podge Visual Basic 6.0 kind of way. In terms of its data type restrictions and fubar scoping rules and brain-dead object extensions, I don't think, as they say, it scales very well.

            My other peeve is that it is proprietary, and while Math Works is not Microsoft, I worry if engineering schools, emphasizing use of "commercial packages students will use in the real world when they graduate" (as opposed to professors dinking around with their homebrew software for use in instruction), are becoming trade schools shilling for the big software houses. I don't have a lot of experience with it, but in place of Matlab we should be using stuff like Python and the Python NumPy extension -- Open Source alternative, comparable performance, C extensions for speed, but much more Turing complete, consistent, and scalable.

            And where is Matlab 6.5 using Java internally? Try doing a Files Open to start editing a Matlab script (M-file) with the Matlab editor window. One potato, two potato, three potato, and the window comes up. Now what language has that kind of GUI lag, I wonder what it could be?

            • The reason Java GUIs are pokey for the most part is that people have been SPOILED by OOP. If you create a New window everytime, then yet, it'll be slow, because Java has to basically learn how to make the window in the given OS, lay it out, and populate it, all before it can display it (as opposed to VB/.NET, which apply very sneaky, often exasperating hints on how to make windows).

              Really, the New window should be made once, the optimizations saved in the assembly cache, and the same window used to subseq
            • Unfortunately, Matlab is still a category killer for certain kinds of pipelining. But the various open-source data analysis languages are coming on strong. Perl Data Language, Numeric Python, Octave, R -- they're all worth a look, though at least the first three fit the IDL niche a little better than the MatLab one. I'm not as familiar with R as I probably should be.

              Unfortunately, all of 'em (including MatLab) suck if you're working with chunks of data that are bigger than your cache, because you end

        • That's a nice plug for Matlab. Since plugs are not being modded off-topic today :-) Let me say that I know several people who use GNU Octave instead of Matlab. It does most the same things, and its free software. Some just for home use, and some working at small companies that couldn't afford Matlab. You can write code that works on both, so one guy uses Matlab at work and can run the same stuff on Octave at home.

    • by LehiNephi ( 695428 ) on Thursday January 22, 2004 @09:37PM (#8062010) Journal
      I see this as a huge opportunity for AMD. They rate their processors based on how many times faster than a Duron 1 GHz runs. Thus, an AthlonXP3000+ runs three times as fast.

      However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3.

      Thus, as chips get faster, AMD's chips will get better performance, not only cycle-for-cycle, but even rating-for-rating!
      • by timeOday ( 582209 ) on Thursday January 22, 2004 @10:39PM (#8062416)
        No, surely AMD will simply change their metric to match whatever Intel is putting out. IMHO there's no way AMD will label something 4000 when it's faster than a PV 4400. That defeats the *whole point* of not using the real clock speed in the first place.
      • However, Intel rates their chips by clockspeed, and with the less-efficient pipeline, a 3 GHz P4 is not three times as fast as a 1GHz P3

        I don't have hard data on this, but doesn't the impact of the pipeline depend on how the software it runs is compiled? If the object code is compiled to reduce branches, the longer pipeline should drastically speed up processing. That would theoretically make a 3GHz P4 MORE than three times as fast as a 1GHz P3.
    • by TubeSteak ( 669689 ) on Thursday January 22, 2004 @09:42PM (#8062051) Journal
      My understanding was that AMD has 3 FPUs to Intel's 2. Oh, and AMD has 3 AGUs (integer units) compared to Intel's 2+2 (two of them also do other things). Anyways, most users, @ the Ghz speeds this proc is coming in at, will never notice the difference. For the people who care, they'll figure out what the proc can and cannot do... then use it accordlingy. Unless you guys really want to run windows, why not compare the Opteron to a Dually Mac? After all, the PowerPC is really good at number crunching.

      How come your computer takes seconds to multiply two 400 digit #s, but ages to factor them?

      • by tomstdenis ( 446163 ) <> on Thursday January 22, 2004 @10:02PM (#8062203) Homepage
        More specifically the Athlon has three ALU/IEU pipeline pairs, 1 FADD, 1 FMUL and 1 FLOAD pipeline [e.g. you can't do 3 FP muls at once].

        The decoder can send upto three instructions into the pipeline per cycle. Actually that's only for directpath instructions [e.g. simple ALU/FP]. Vector instructions stall all three decoders.

        The ALU scheduler is fairly strong but it does have several weaknesses. from the manual I can't see that it can resolve dependencies from other pipelines. For instance,

        ADD EAX,EBX [DIE ]
        ADD EBX,EAX [D IE ]
        ADD ECX,EBX [D IE] - critical path
        INC ESI [ DIE ]

        D == decode, I == issue, E == execute [pp.. 227 of the athlon opt manual].

        So the fourth instruction will always start on the second cycle despite the fact that ALU1/2 are blocked.

        Similarly the Athlon memory ports are a bit weak. There are read/write buffers but you still can only issue two reads or one write per cycle which is annoying.

        However, the strength of the Athlon ALU over the P4 ALU is that for the most part it can keep all three pipelines busy even if they are blocked at some stage [e.g. it can decode/issue even if blocked]. It doesn't say in the documentation but I could swear the Athlon can cross-pipe things too. Cuz sometimes I can mess the order of ops [e.g. create a dependecy] and it executes in the same time regardless.

        Anyways, yeah it's all about the 3 ALUs and a decent scheduler. Something the P4 does not have.

    • by vlad_petric ( 94134 ) on Thursday January 22, 2004 @10:15PM (#8062283) Homepage
      Matlab is mostly loops. Loops generate branches with high predictability, and as a consequence deep pipelineing won't incur much performance loss. Furthermore there's a lot of parallelism in those loops, and the out-of-order execution engine is quite good at exploiting it (i.e. hide the long latency of FP ops by overlapping them)

      It's much more likely the size of the L2 cache is affecting you (i.e. your working set does not fit into P4's L2 cache but it does in Barton's).

      If you don't believe me, try the demo version of Intel Vtune performance analizer on matlab running one of your programs.

      How well your caches perform is probably the most important thing for a processor today, as the speed of the main memory is a couple of orders of magnitude under the speed of the processor. It takes a couple of hundred cycles to service an L2 miss, while a long FP operation takes at most 20 cycles.

      • Re:Do you? (Score:3, Insightful)

        by gr8_phk ( 621180 )
        Thanks for the techno-babble. This guys company obviously looked at real world performance. Their understanding of the cause may or may not be correct, but their conclusion (switch to AMD) is correct for them because they compared using the application that matters to them.

    • The deep pipelines in the P4 perform poorly, period. Even when running simple desktop apps on a Windows machine, I notice my P4-2.5GHz w/1GB RAM at work often jerks around or lags, while my Athlon 1900XP+ w/256MB RAM at home works like lightning. Obviously processor is not the whole story, but I think that under typical, multi-tasking usage, the deep pipelines are even more painful than benchmarks suggest.

      Disclaimer: I am not an EE, so I could very well be full of shit.
  • by Selecter ( 677480 ) on Thursday January 22, 2004 @09:30PM (#8061924)
    I guess Intel's short term game plan is to keep the Mhz game going yet again until they can get something going on the 64 bit front worth having.

    I suspect AMD and even Apple are going to shrink Intel's bragging rights in that same time frame unless Intel gets their act together. From AMD's recent earnings report it sure seems somebody is buying Athlon 64's.

    Intel blew it when they made the decision to let 32 bits ride for another 2 to 3 years. They look like old fuddy-duddys now. It's AMD and Apple via IBM thats has the cool shit.

    • by dpilot ( 134227 ) on Thursday January 22, 2004 @10:18PM (#8062309) Homepage Journal
      Intel has backed themselves into a bit of a corner, in the process of repeating history. With Itanium, they've proven that they're more concerned with their own strategies than they are with delivering solutions to their customers. But they've sunk so much money and image into Itanium that they can't back out, yet. No doubt there's someone inside the company, probably a wild duck, working on the right time to jump ship and how to spin it.

      In the meantime, Intel has the one-two bait and switch with P4-Celeron and the true P4. If they didn't have a TON of money and market clout, they'd be in big doo-doo right about now. As it is, AMD is the one in big doo-doo, not because they have the lesser product, but because of Intel's clout.

      Listen to any computer commercial, and they pretty much all have those 5 co-advertising tones at the end. That's monopoly power, that's market clout. (If I were in charge, the antitrust penalty would ratchet up every time those tones sounded.)

      Maybe Intel blew it, but they'll survive.
      • by Jerf ( 17166 ) on Friday January 23, 2004 @01:00AM (#8063169) Journal
        Maybe Intel blew it, but they'll survive.

        We don't want them to die. We want them to pass through it and come out an older and wiser company, less inclined to pull shit it has learned the hard way it can't get away with, no matter how big it is.

        Compare the IBM of 2004 to the IBM of 1984.

        If Intel were to "die", the resulting market would have lost the wisdom that Intel is likely to learn over the next couple of years, barring some technical miracle.
  • So What ? (Score:4, Interesting)

    by El Cabri ( 13930 ) on Thursday January 22, 2004 @09:30PM (#8061925) Journal
    I'm kind of tired of the perpetual whining of armchair hardware designers. So the happy few, highly paid architects, 30 years-experience in the industry, hundred-published scientific papers at Intel decide that the next gen chip will have more stages and they have to be called morons ? How do you know better ? Hasn't intel produced the fastest chips on the market with each and every micro-architectural generation ? Long pipelines = costly branch mispredicts, whoooaah, you're so bright why don't YOU have the job leading the prescott team ? branches can be predicted. Long pipelines can improve throughput. Microprocessors are all about trade-offs. Let the pros do the work and go back playing Quake.
    • Re:So What ? (Score:2, Insightful)

      by fredmosby ( 545378 )
      I agree with the argument you are trying to make. But it would probably work better if you were less condescending.
    • Re:So What ? (Score:5, Insightful)

      by addaon ( 41825 ) <addaon+slashdot AT gmail DOT com> on Thursday January 22, 2004 @09:37PM (#8062001)
      Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others.

      And of course, Intel's motivations are entirely performance, or at least price/performance, not marketing.

      The fact that every other company has chosen a different design decision and has made better chips as a result is just an illusion foisted on us by those who think there own thoughts.
      • Re:So What ? (Score:4, Informative)

        by harlows_monkeys ( 106428 ) on Thursday January 22, 2004 @09:43PM (#8062064) Homepage
        Right, Intel always has had the fastest chip, if you ignore things like Alpha, Athlon, Opteron, Power, PowerPC, and others

        Intel P4 and Xeon beat 4 of the 5 you name on SPEC.

        • Re:So What ? (Score:5, Insightful)

          by adrianbaugh ( 696007 ) on Thursday January 22, 2004 @10:02PM (#8062197) Homepage Journal
          We're supposed to be impressed by Intel's latest and greatest chip beating Alphas that aren't even produced anymore?
          I'm not wishing to knock Intel but it seems that these days whoever has the newest fabrication plant. Intel brings out a new line of chips: they're faster. So AMD brings out a new line of chips later on: bang! they're faster still. And so the merry dance goes on.
          Of course, this is all to the consumer's good as it means there's far more competition. But as far as the consumer is really concerned it doesn't matter so much who currently has the fastest chip as whose chip currently offers the best value while still being "fast enough". For my money that's been AMD for a while now.
    • Re:So What ? (Score:4, Insightful)

      by afidel ( 530433 ) on Thursday January 22, 2004 @09:37PM (#8062006)
      Intel's engineer's didn't decide the direction of the processor. The whole direction of Intel's desktop line has been controlled by marketing concerns since the initial stages of development on the P4. The engineers got to do as they wished with the Itanium but unfortunatly they went too far the other way and completely forgot about marketing concerns like running legacy code.
    • by stevesliva ( 648202 ) on Thursday January 22, 2004 @09:45PM (#8062077) Journal
      I'm kind of tired of you armchair OS coders. So the happy few, highly paid Microsoft employees, 20 years experience in copying IBM, thousands of stock options in Redmond decide the next gen OS will have some wack FS and they have to be called morons? How do you know better? Hasn't Microsoft produced the best selling OS on the market for 15 years? Why don't YOU have the job leading the Longhorn team?

      Oh. Yeah... LINUX.

      Nevermind-- go back to writing the best OS there is.

    • Re:So What ? (Score:3, Interesting)

      by drinkypoo ( 153816 )
      obviously branches cannot always be predicted, and intel has traditionally (not a long tradition, OoO is relatively new, but still) been poor at it. Witness the amazing slowness of the P4 compared to the P3, clock for clock. Some of those pipeline stages in the current P4 are already there for signal propagation, I suspect more of them in this core will be so-called "Drive" stages in which the CPU is doing nothing but waiting for signal propagation.

      Intel has the fastest chips (by a fine RCH), but AMD has

    • I've not helped to design an operating system or really any part of an operating system, but I can damn well tell you that Windows ME was a shitty OS. It doesn't take any experience for me to tell this; I can determine this by simple observation.

      When the tire of my car explodes in an open road, it would not take much expertise on my part to diagnose it as a problem with my tire (they really aren't supposed to explode). And, when it happens to many other people with the same tire, it wouldn't take any e
    • Re:So What ? (Score:3, Insightful)

      by EmagGeek ( 574360 )
      A brief history of microprocessor development:

      The company I work for invented the first 16-bit microprocessor EVER, the CP1600 (ok, to be fair, it was a joint effort between us and a partner company), which was released in late 1974, when Intel was a scant 6 years old and PC meant "Pissing Clear." Intel was still a long 4 years away from introducing the 8086, which was only an 8-bit CPU anyway.

      Nobody ever talks about the CP1600 because it was not oriented toward "personal" computers. After all, why the he
  • Pipeline stalls (Score:4, Interesting)

    by k4_pacific ( 736911 ) <[k4_pacific] [at] []> on Thursday January 22, 2004 @09:32PM (#8061949) Homepage Journal
    When the processor branches, all the partially executed instructions in the pipeline are lost.

    They could minimize this by creating two different conditional branch instructions for each condition. One for cases where the programmer expects the branch to occur most of the time, and one for where the branching rarely occurs. They could then optimize the pipeline behavior for each case. If its a 'likely branch' instruction, it could start fetching commands from the branch. If its an 'unlikely branch' instruction, it could prefetch the next instructions after the branch.

    This would work well in loops where every time but the last, the processor branches back to the top.
    • and is now common. These days it usually works by maintaining a history table of past branch behavior. Generally if you've had alot of branches before, you're in a loop, and statistically are likely to stay in the loop.

      You can also go back and "fix" instructions to an extent (and not in all cases) while in the pipeline in case of incorrect branching. x86 sort of sucks for this though because of the variable length instructions.

      Alot of computer science is based on those kind of statistics. You see it

  • by Lothsahn ( 221388 ) <Lothsahn@@@SPAM_ ... u_bastardsyahocm> on Thursday January 22, 2004 @09:32PM (#8061950)
    It'll most likely be slower per clock cycle.

    What this means, is that it will take a faster clock cycle (4GHZ, for instance) to do the same amount of processing as the Northwood core. However, increasing the pipeline should allow Intel engineers to achieve higher clock speeds, as the longest transistor path will likely be shorter (faster switching times).

    In essence, Intel is attempting to increase the speed of their CPU's by focusing on increasing the clock speed (P4), while AMD is focusing on increasing the amount of calculations per clock cycle (Hammer).

    Of course, there are a lot of more complex tradeoffs that factor in (ie. branch prediction). I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.
    • by edrugtrader ( 442064 ) on Thursday January 22, 2004 @10:01PM (#8062193) Homepage
      I highly recommend reading a computer architecture book if you're at all interested. It's really facinating stuff.

      dude, i don't even read the articles.
    • It'll most likely be slower per clock cycle.

      Yes, I agree. My guess is that they're trying to achieve higher absolute performance. What surprises me is that this is still considered a P4 core, since adding pipeline stages (even 1 stage) is a very non-trivial task.

      This'll also kill the benefits of reduced power consumption of 90 nm technology (increase in area from the additional pipeline registers, increase in frequency), which is important in server design. An argument about the benefits of having a t []

  • by lambadomy ( 160559 ) <.lambadomy. .at.> on Thursday January 22, 2004 @09:33PM (#8061964)
    Assume for a second that Intels P4 design was really meant to boost GHz numbers easily (to guarantee victory in the GHz war if not the performance war). If so is the Prescott design now due to having to keep up with themselves? Obviously they could design a chip that is "faster" but runs at a lower clock speed than the P4s, but they've pushed the GHz number so much that now they're kind of hamstrung in their design options.
  • by uarch ( 637449 ) on Thursday January 22, 2004 @09:35PM (#8061986)
    Re-read the register article. Its not the Intel guy who said 30 stages, its the Register who is guessing. They're assuming that since it went from 10 to 20 before it'll go from 20 to 30 now. Its not likely to end up being more than a few extra stages.
  • by StarCat76 ( 644079 ) <niceguyneil@ g m a> on Thursday January 22, 2004 @09:35PM (#8061987) Homepage Journal
    Although the Prescott core will have a longer pipeline, it will proboably end up performing a bit better clock-per-clock against Northwood. This is due to a couple reasons. Firsly, Prescoot has 1 MB on-die L2 cache. That's a good bit, and one could see how the P4 was helped by the 2M L3 cache in the P4 "EE". Secondly, the new P4 will have improved hyperthreading. It will also have somewhat improved branch prediction and implements PNI(Prescott New Instruction) which will require a recompile to help things out. All in all, I see the Prescott as being just as fast or faster per clock as Northwood, mostly due to the doubled L2 cache.
  • by johnthorensen ( 539527 ) on Thursday January 22, 2004 @09:36PM (#8061996)
    So, since Prescott has approximately a 30 stage pipeline, I guess Intel has decided to continue to ignore the low-power consumption market, leaving it open to people like VIA and Transmeta. This is really disappointing to a lot of folks in the embedded markets, who would really like to see Intel ship something with significant horsepower that doesn't require a heatsink with the mass of a black hole to keep running.

    Word has it that VIA is readying a new x86 processor to their line that supposedly has P3-class FPU performance while maintaining the same levels of poser consumption as its predecessors. It is expected that this processor may actually have a big win in front of it for DirecTV boxes. With the extra CPU horsepower, it should be exciting to see what nifty features come out of this, especially considering most set-top CPUs generally just act as "traffic cops" for the data moving between ASICs. If they're really making the move to this class of processor, perhaps they've got more in mind.

  • compilers (Score:4, Informative)

    by Mieckowski ( 741243 ) <> on Thursday January 22, 2004 @09:37PM (#8062005)
    I suppose that this makes having a good compiler a little more important. Compiling the same program for a G4 on a compiler other than GCC gave me a 100% speed boost. I don't know if branch mis-prediction came into play, but it had a conditional in its inner loop (it displayed the mandelbrot set).
  • by Anonymous Coward on Thursday January 22, 2004 @09:37PM (#8062007)
    It sounds like Intel has totally given up on efficiency, and has the Marketing department doing processor requirements now... (has to clock to xGHZ!)

    I've been working with Dual Opterons for a few months now, and have been very impressed as to their speed, heat dissapation, and bang for the buck.

    A large data transformation job (really doing a scrape of a mainframe report for data) on the order of 1.1GB processed much faster on an IBM E325 Dual Opteron 2.0ghz running 32bit Windows (ack) than my Dual 2.4ghz Xeon (w/HT) running Windows (double ack)....

    Yeah- it's not a benchmark, but it is real world performance.

  • by metlin ( 258108 ) on Thursday January 22, 2004 @09:38PM (#8062013) Journal
    I had found an interesting article exposing the innards of the 775 pin Prescott -- see it here []

    (Credit: Got it off The Register from this article [])
  • Myth? (Score:5, Funny)

    by The Bungi ( 221687 ) <> on Thursday January 22, 2004 @09:38PM (#8062014) Homepage
    Alizarin Erythrosin writes "Further contributing to the MHz Myth ...

    Let me guess - 'Alizarin Erythrosin' is Cupertinus Elvish for 'Mac User', right?

  • ummm... (Score:3, Funny)

    by circletimessquare ( 444983 ) <> on Thursday January 22, 2004 @09:41PM (#8062036) Homepage Journal
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls.

    no, i didn't know that
  • by TitusC3v5 ( 608284 ) on Thursday January 22, 2004 @09:49PM (#8062106) Homepage
    ...since my next computer is going to house a G5.

    Personally I'm tired of trying to keep up with the gHz war between AMD and Intel. With our current technology, the only areas really pushing processing speeds are gaming and video/image applications(that I'm aware of). My grandmother doesn't need a P5 4gHz to check her email, and neither do I if I simply want to write a paper.
  • by Wesley Felter ( 138342 ) <> on Thursday January 22, 2004 @09:58PM (#8062161) Homepage
    In case anyone wants some hard facts:

    A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor [], ISCA 2002.

    M.S. Hrishikesh, Norman P. Jouppi, Keith I. Farkas, Doug Burger, Stephen W. Keckler, Premkishore Shivakumar (UT Austin, Compaq): The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays [], ISCA 2002.

    Eric Sprangle , Doug Carmean (Intel): Increasing Processor Performance by Implementing Deeper Pipelines [], ISCA 2002.

    A. Hartstein and Thomas R. Puzak (IBM): Optimum Power/Performance Pipeline Depth [], MICRO 2003.

    What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.
    • A. Hartstein and Thomas R. Puzak (IBM): The Optimum Pipeline Depth for a Microprocessor [], ISCA 2002.

      Let me guess...42?

    • by -tji ( 139690 ) on Friday January 23, 2004 @12:08AM (#8062893) Journal

      > What all these papers have in common is that they find that increasing the pipeline depth past 20 stages increases performance.

      Is that a typo, or am I misinterpreting the papers you liked above?

      In all but the Intel paper, it looked to me like they were saying the optimal pipeline depth was somewhere between 6 and 20 (depending on workload).

      In the introduction of the Intel paper, it says "Focusing on single stream performance". So, basically they are focusing on artificial benchmark performance.

  • by scrote-ma-hote ( 547370 ) on Thursday January 22, 2004 @09:58PM (#8062162)
    As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls
    Yeah, um who here actually knew that. I'm struggling to believe it's anywhere near 1/2. I'm sure a poll would clear this up.
  • by mosb1000 ( 710161 ) <> on Thursday January 22, 2004 @10:04PM (#8062216)
    Gosh, I'm feeleing really left behind, my G4 400 only has 4 stages in it's plpeline. At least it's build on a .22 micron process as apposed to the Pentium's measly .13 micron process. Yes, that was a joak
  • by Zebra_X ( 13249 ) on Thursday January 22, 2004 @10:15PM (#8062281)
    Intel has shown no real interest in joining the 64-bit fray. Indeed, they don't have much choice. To release a 64/32-bit chip at this point would truly create an Itantic out of the Itanium. Microsoft would have more or less wasted it's time producing low volume products such as SQL Server 64 and XP 64 (different than XP 64-bit extended which is as yet to be released). Other consequences for such a shift in strategy would include, a number of people investing in the itanic platform who would be the proud owners of an all but useless, but very expensive hardware platform on their hands.

    Most real world tests point to AMD chips being faster. The Int and Floating Point Tests still belong to the P4 3.2, but the P4 is having to pass the 1st place troughy to AMD when it comes to games and office productivity.

    And then there is price. For $320 you can get $700 worth of Intel performance. Mind you this is the AMD64 running in 32-bit mode.

    It would appear that all that is really needed to justify mass market adoption is a consumer OS, that would be Windows XP 64-Bit extended. Currently in Beta. The only delay there is that the .NET framework is not 64-bit ready. We can probably expect it's release with VS.NET Whitby, a.k.a. .NET 2.0.

    After that - we just need to see some AMD adoption in the mainstream pc builders.
  • Effective pipeline (Score:4, Interesting)

    by jmv ( 93421 ) on Thursday January 22, 2004 @10:21PM (#8062323) Homepage
    I read somewhere that on the P4, when an instruction is already in the L1 cache, the pipeline gets shortened. That's because the L1 instruction cache stores pre-decoded instructions (micro-ops). This means that when the instruction is reached again, the decoding (and branch prediction?) steps are already done, shortening the pipeline. When the instruction is not in cache, there's already a big hit anyway. With that in mind, we'll need to see whether the extra pipeline stages in Prescott will still be there when the instruction is in the L1.
  • Technical discussion (Score:5, Informative)

    by Rufus211 ( 221883 ) <rufus-slashdot@h ... minus cat> on Thursday January 22, 2004 @11:01PM (#8062544) Homepage
    For those into the technical side of this type of stuff and heck of a lot higher S/N ration, check out the Ace's Hardware [] forum []. There's a large thread [] going on overthere taking about the rumors and what it would actually mean.
  • by mrm677 ( 456727 ) on Thursday January 22, 2004 @11:11PM (#8062610)
    "As most of us know, a longer pipeline can lead to slowdowns in the form of branch mispredictions and pipeline stalls

    Get off your high horse. Intel architects aren't dummies. Itanium benchmarks are starting to whoop some serious ass and the P4 and Athlon have been neck-and-neck for years. I'm sure Prescott will perform very well.

    I can get into all kinds of architecture speak as to why your simplistic notions of mispredictions and pipeline stalls might not be so terrible. Who knows? Maybe Intel will execute both paths of a branch? They've already got partial instruction replay to make squashes much less expensive. With deep speculation, a big instruction window, good bypassing capabilities, and effective non-blocking caches, "pipeline stalls" are not an issue due to branch mispredictions. The bigger issue is memory latency/bandwidth and Intel has always done well with that. A branch misprediction can be easily L2 cache miss can't.
  • by rice_burners_suck ( 243660 ) on Friday January 23, 2004 @01:31AM (#8063301)
    Intel today announced its new 1024-hexabit microprocessor architecture technology. Named the Quantium, Intel's new processor core boasts powerful new technologies which will enable governments to better manage the rights (or lack thereof) of their subjects.

    The Quantium has the following new features:

    • Intel (r) LightSpeed (tm) technology breaks the processing pipeline into 299,792,458 discreet steps. As there is no internal clock within the processor, all operations occur at the speed of light. Hence, one "cycle" represents the absolute cosmic measure unit of time and all operations occur in one cycle. While this will not increase the processor's performance--indeed, it will pale in comparison to that of the ancient 80286 processor of old folklore--the faster internal clock speed is expected to increase Intel's sales by 0.000001% within 180 quarters.
    • Intel (r) SingleAtom (tm) technology squeezes the entire processor into a single atom by modifying the universe at the M-theory level. Individual strings compose modified quarks and other subatomic structures, which combine to form a very heavy atom, one with approximately the same weight as 1 million protons. As the matter is extremely dense, the radioactive decay, combined with the gravity generated by itself causes the configuration of the subatomic particles to remain bonded at the subatomic level while realigning a nearly infinite number of times every second. This realignment constitutes the execution of instructions within the SingleAtom (tm) processor.
    • 893,378,665,113 new operations have been added since the previous model, bringing the new total to over 18 googleplexes of instructions. All SCO intellectual property can be programmed in a single instruction, increasing SCO revenues. Corporations will have to pay $799 per processor instruction executed, or face serious legal action.
    • RAM has been depreciated. 4 billion exabytes of internal general-use registers allow software to make more efficient data access, providing a more compelling Internet experience over a 28k modem connection.

Some people manage by the book, even though they don't know who wrote the book or even what book.