Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
Intel

Intel Pentium 4 NetBurst Architecture Explained 130

fr0child writes "Next week is Intel's Developer Forum (IDF) and it seems they'll be releasing quite a bit of information (aka hype) about the Pentium 4. Anandtech seems to have gotten the scoop on Intel's NetBurst Architecture, basically covering the P4's internal architecture."
This discussion has been archived. No new comments can be posted.

Intel Pentium 4 NetBurst Architecture Explained

Comments Filter:
  • P.S. You can also get the scoop [sharkyextreme.com] over at sharky extreme.
  • Simply asked, if NVIDIA is cramming so much in their GPU, which is about to be challenged by Intel P4, would they be able to react by bringing out a central processing unit ?
  • Using RAMBUS-only also cuts out the smart market :-)
  • I was always under the impression that RISC meant reduced instruction set computing and cisc complex instruction set computing..
  • Could someone explain to me how having a longer pipeline speeds things up?

    Ask and you shall receive.(Not that I claim to be a genius about these things ;)

    CPU pipelining in general is a divide-and-conquer scheme. Say you have some set of tasks, each of which you know you can do in ten minutes (take out trash, vaccuum living room, rebuild the linux kernel, etc.) What if you could break each of these down into smaller steps (like finding the trash, collecting it, taking the bag to the curb). Say each of these smaller steps takes no more than three minutes.

    Now, you've got a lot of the large-scale tasks to do, so you invite a bunch of your friends over. Each person takes a specific task and does it in assembly line fashion.

    Here's the scenario:

    1. We start by finding the trash (3 minutes)
    2. someone collects it (6 minutes total)
    3. someone else bags it (9 minutes total)
    4. finally someone runs to the curb (12 minutes total).

    Oops! we're at 12 minutes, that's 2 minutes more than it takes me to do it by myself. But wait, how long does it take to get the next bag to the curb? If the searcher started looking for more trash as soon as he gives the first trash to the collector, he now has more trash ready to be colleted. Follow this through, and you get something that looks like this (apologies for the strangely aligned ascii-art):
    -----------------------------------------
    |_search__|_collect_|___bag___|___dump__|
    -----------------------------------------
    |_block1__|__NOP____|___NOP___|___NOP___|
    |_block2__|__block1_|___NOP___|___NOP___|
    |_block3__|__block2_|__block1_|___NOP___|
    |_block4__|__block3_|__block2_|__block1_|
    |_block5__|__block4_|__block3_|__block2_|

    (and so on)

    As you can see, by the time the dumper gets around to dumping block1, we have block2 (the next bag of trash) ready for him to dump. If everything proceeds in "lock-step" (that is, no one gets ahead of the others), then after that first bag of trash, we can dump a bag every 3 minutes. This is much faster than on bag every 10 minutes, and this is the most common type of speedup associated with piplining.

    The first part (where various people are standing idle is called "filling the pipe." Certain pieces stand empty because we're working on the first instruction, and we just haven't gotten to them yet. In a CPU for example, the ALU will usually stand idle until the instruction decode logic has examined the bit arrangement of the instruction word and determined whether we should be ADDing, MOVing, BRAnching, or whatever.

    Oh, and that brings up branches. Say you've got to stop bagging trash and start washing dishes or something. Everyone has to stop what they're doing (once they're finished with the tasks still waiting on them), and start work on something else. Sometimes we know this is coming and have prepared for it (known as "branch prediction"), but sometimes we guess wrong (say we thought we'd be doing laundry next). Now we not only are switching jobs, but we're switching to something unexpected. This can essentially cause us to skip a beat. This means that for a moment or two, someone (or two, or everyone) will have nothing to do. Then the pipe fills again and we're back up to speed.

    CPU designers spend lots of time trying to find new ways to keep the pipe full, because, as you can see, the more a pipe stays full, the faster a system is overall. This is why branch prediction is so important. In addition to branch prediction, there's a whole slew of techniques, like "out of order execution," that allow us to make up time in other ways.

    This is a four-stage pipeline. Imagine what's going on with Intel's 20-stage monster! If we can continually decrease the amount of time taken by the longest stage in the pipline, then we can generate more instructions per second: we still have 1 instruction per clock (IPC), but if we can speed up the clock, we get more throughput (IPS). How can we speed up the clock? We do less in each clock cycle and spread the instruction out over more pipeline stages.

    This is a really simple overview. There's lots more where this came from, and it's all really neat stuff for those interested in hardware. For those interested, I'd recommend a book or course in hardware organization. You'll need a working knowledge of logic gates, some combinatorics, and some circuit design skills, but it's worth it!



    --
  • yeah, OC Workbench [ocworkbench.com] has a little thing [ocworkbench.com] with a link to some high-magnification pictures [valley.ne.jp] of what can happen to your CPU if you don't treat it OK. Can you say "sliced Pentium"?

    Members of the Society for the Prevention of Cruelty to CPUs (SPCC) should not look....! It looks like it could have hurt ("megahurt", as The Register has it [theregister.co.uk]).

    Liquid cooling indeed. Why do I feel we have come full circle? I remember the liquid cooled machines at CERN - I thought we had gotten rid of those....

  • I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc..

    You can buy RDRAM right now if you want to. Hardly vapour.

    Engineering prototypes of the IA-64 have been around for a while, with every indication that they will ship. Doesn't look very vaprous to me.

    IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)

    Firstly, GCC is not the best compiler in the world. When comparing an Alpha to an IA-64 chip, I'd use Intel's compiler on the IA-64 and Compaq's compiler on the Alpha. Both companies have a history of writing compilers that were extremely well optimized for their platforms.

    Secondly, I don't see much support for your figures. See my next point.

    Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)

    Not really. Alpha chips are about even in everything except floating point (where the Alpha blows *everyone* out of the water - Sun, HP, IBM, Motorola, etc).

    They do this with the higher speed grades of their chips that were released _recently_. Older chips used the same design but were clocked more slowly, and don't blow away present chips.

    Check http://www.spec.org [spec.org] for reasonably accurate benchmark information. They use the fairest system for evaluation that I've seen (standard test code supplied by SPEC, compilation and system tweaking handled by the companies owning the platforms being tested).

    As far as the performance of the Alpha or an Athlon vs. the P4 goes... The P4 is still in the final debugging stages. Wait six months and look for SPEC marks.

    Personally, I'd like to see SPEC marks for the G4. Apple has been allergic to SPEC of late.
  • The author of that quote was clueless and should have said "microarchitecture". The architecture of Pentium 4 is very similar to Pentium III, but the microarchitecture is 100% different, and is a complete overhaul.
  • by SpinyNorman ( 33776 ) on Monday August 21, 2000 @04:28AM (#840768)
    1) Pipeline stalls / operand latency:

    If the compiler and/or CPU is unable to reorder instructions effectively (or if a particular piece of code is not amenable to reordering), then an instruction in the pipeline may not have it's operands ready when it needs them and will stall the pipeline waiting for them. With a longer pipeline it will take more clock ticks for the necessarty operands to work their way thru the pipeline to clear the stall. Intel have added a double clock speed arithmetic unit (ALU) to the P4 to try to mitigate operand latency.

    2) Branch mispredict penalty:

    When a modern CPU such as the P4 encounters a branch instruction, it predicts whether the branch will be taken or not (by using the execution history) in order to be able to continue processing instructions through the pipeline. When the branch is finally evaluated near the end of the pipeline it may turn out that the prediction was wrong, and that all the instructions following the branch (now in the pipeline) should not ne executed. In this case the processor has to flush the pipeline and instead take the correct branch. This "pipeline flush" branch mispredict penalty is obviously higher the longer the pipeline is - a 20 stage pipeline means you are throwing away 20 instructions when a branch is mispredicted.

    P4 was designed with a long pipeline so that each pipeline step could be very simple/quick and therefore the processor could have a very high clock rate. The downside of doing this is the above two problems, which mean that the average number of instructions executed per clock cycle (IPC - aka processor efficiency) gets reduced.

    P4 at 1.4GHz may be faster than P3 at 1GHz, but because P4 will have a lower IPC than P3, it won't be as fast as a 1.4GHz P3 (if we ever see one) or 1.4GHz Athlon (which we will see).

    The one area where P4 should excel is in SSE2 optimized floating point math intensive applications, which is why Intel are now trying to reposition the P4 as an Internet/multimedia CPU rather than a general purpose one. The fallacy of this is that once you can decode your DivX in real-time, you don't need to go any faster!

  • You've missed the point again. Please.

    You say "it _is_ the first overhaul by Intel"
    I say "it's _not_ an overhaul".

    My introduction of the other companies into the argument was simply (blimey, I add things to simplify my argument and I start losing people) to indicate what a "complete architectural overhaul" _really_ means.

    If my statement is correct, that the change does not classify as a "complete architectural overhaul", then your, and the original statement are vacuously false as they refer.

    Do people not learn logic and grammer any more?
    Reread what I posted; see what I put; unravel the rhetoric of using sarcastic irony if you may; and condense what I wrote in your mind to 'Intel have not performed a "complete architectural overhaul'". It is there.

    If you still don't get it watch the "he's got a knife" scene from Crocodile Dundee.

    FatPhil

  • When I first read the article, I thought it was about some sort of excruciating torture; then I looked closer and found that it wasn't in fact, about Nut Bursting.

    If this doesn't outperform AMD's offering, then the parodies will be flying fast.

  • What you're missing is that the P4 is going to be a single-cpu part, so there's no reason to split up the bus. Even in a dual processor setup, each cpu isn't hitting the bus for it's full capacity anywhere close to 100% of the time unless it's running a loop or accessing memory that doesn't fit inside it's cache, in which case the software design is holding it back more than the system bus anyhow.

    I don't think that many users would ever notice the difference, and intel probably can't afford to design it's next consumer level chip around a few percent of the market.

  • Worse: cold fusion. Guess their server needs more deuterium, or fantasium, or something.

    ---- ----
  • Mainly because only part of the chip is running at that speed, not the whole thing.
  • by cybaea ( 79975 ) <allane AT cybaea DOT com> on Monday August 21, 2000 @04:48AM (#840774) Homepage Journal

    Hmm, it's there at the bottom of the page:

    Intel also informed us that the Pentium 4 would strictly be a uniprocessor part, meaning it won't even work in multiprocessor boards.

    So, yes, you are right: they don't support SMP so why would they split the bus?

    But I question your "intel probably can't afford to design it's next consumer level chip around a few percent of the market" comment.

    First of all, if Intel can't afford it, who can?

    But more to the point: Is it really only a few percent of the market? I've just ordered a dual PIII and I selected the chip specifically because I could get SMP support. Does anybody have any statistics on single- versus multiple CPU PIII systems shipped? Is it really only "a few percent"?

  • by Mike Connell ( 81274 ) on Monday August 21, 2000 @02:22AM (#840775) Homepage
    From the CNET article:

    > The chip also comes with 144 new multimedia instructions for better graphics and sound.

    I'm weeping! I *know* that they're multimedia instructions and so on, and probably really useful, and that people aren't hand coding this stuff... but doesn't anyone else think this is ugly?

    Whatever happened to RISC?

    Mike.
  • by Anonymous Coward
    Basically, damn near no one runs a multiproccesor box. I know because I HAVE run one for the past 2 years: Intel puts SMP on a very low priority, and software designers do too. For example, there is only one MP3 encoder (gogo) that can even run with multiple threads. At a lower level, the ONLY operating system I haved used that even comes close to making good use of SMP is BeOS.

    Bottom line: It would be a major disadvantage if the world would just wake up and start making SMP more practical, but unfortunatly, even Linux has not helped that much.
  • SHR? It's easy: take the register, shift it right by one bit, and subtract the right-shifted version from the original...

    Oh. You meant SHift Right, not Shift-right is Halfway Recursive? Sorry.
  • Found it!!!

    I just found the URLs I wanted. These people make QDR memory [qdrsram.com] and these people make the controller [isdmag.com]. Now this is SRAM memory, not DRAM, but I don't think it would be too hard to make a mobo that took this, couldn't be anymore expensive than RAMBUS could it? Plus the controller has a sustained thoughput of 7.2GBps at 100MHz, WOW!!!!!

    The world really is round and aliens are not stealling your socks...

  • NetBurst - Intel
    LightPipe - SGI
    AMD has their flavor

    Who's the best?
  • I get a filter-errormessage when I try to access that page, and i'm not running any filter, so it seems their server has b0rked already.

    hmf, slashdotting is to powerfull. :)


    --
  • by pantherace ( 165052 ) on Monday August 21, 2000 @05:09AM (#840781)

    I am tired of seeing Intel put out more and more vaporware. RDRAM, IA-64, etc, etc... I don't know of any other chip maker that puts out so much vapor. AMD's chips did what they were intended to do. DEC (compaq) Alphas haven't failed yet, (supposed to be 1.5GHz+ by the end of the year.)
    I am willing to bet that AMD will have a 64-bit arch out (mainstream) before Intel.
    IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha. (likely the kind that is 3x an Athlon or more for a P3)
    Even a 2 year old alpha can beat most P3s (1.5 -2x P3 MHz = alpha MHz in performance)
    Another thing 550 P-3 $159, 600 Duron $99 (or 109, can't remember exactly). Duron is not 2/3 a P3's performance. Is Intel too greedy? In SV, I talked to an Intel CAD engeneer and he said as long as it sold for a 24 or 26% profit Intel would make anything. I wonder what AMD's profit level is.

    btw anyone ever looked at Alpha vs Intel's touted FP performance? hint, Intel is in the dust.

  • Whose tech sharing with DEC resulted in Intel employing such Alpha features as Branch Prediction without license and a 12 count complaint filed against Intel. All very interesting stuff.

    Vote [dragonswest.com] Naked 2000
  • I have a PIII 500 on a 440BX board. I've been waiting for Intel to give up Rambus and make a high-end chipset with AGP4x and slot1. Is this a lost cause? If so, I'll probably wait until either:

    Rambus gets off their high horse and lowers the prices;

    Rambus drops their royalties;

    Something beats Rambus; or

    The Athlon-Killer arrives

  • Hmmm. mucho confusion. is anybody else getting one of these:-

    Access denied to system because of URL Filter Configuration, while attempting to retrieve the URL: http://www.anandtech.com/showdoc.html?i=1301.

    And if not then which "local filter list" are they referring to?

  • AMD's profit margin is less than Intel's. If you've seen the kind of price decreases AMD has put up, you'll know why. You can get a 1GHz Thunderbird Athlon for around $650 on pricewatch.

    I think you've confused the term "vaporware." Vaporware is stuff that doesn't show up. It is said to come out, but never does. Given the fact that RDRAM is shipping, and the P4 is close to shipping, you can hardly term them "vaporware." Vaporware has a much narrower definition than most /.ers realize. The Nintendo 64 magnetic drive was vaporware (it was called the 64DD, is it just me, or does read like a cup size?) The Matsushita M2 (another propose console using dual PPC 603e chips) was vaporware.
  • In a preview of the chip at the company's headquarters, technicians showed how a Pentium 4 computer can rapidly render, or draw, 3D images downloaded from the Internet. That sort of processing power could make it easy for sellers on eBay to post virtual representations of their products, for example.
    Recently, in a preview in a dorm room in Vermont, technicians showed how a Pentium II (r) computer equiped with an inexpensive 3D graphics card can rapidly render, or draw, 3D images.

    duh. is it just me, or is this just a load of crap. with the incredible tech available right now in 3d video cards, which are getting better all the time and will probably hit the ceiling pretty soon, why would any home user want 3d on their cpu? for the extra cash it would cost to get this feature, i'd rather spend on a kick-ass 3d card. cut the crap with all this hardware bloat and just give us a fast reliable chip! oh, and a motherboard with a reasonably fast bus would be nice as well, but let's not get started on that one...
  • In a preview of the chip at the company's headquarters, technicians showed how a Pentium 4 computer can rapidly render, or draw, 3D images downloaded from the Internet. That sort of processing power could make it easy for sellers on eBay to post virtual representations of their products, for example.

    Sweet! I've been waiting for features like that forever! Thanks Intel and thanks CNET! You guys rock!

  • CNET:
    "The chip [...] represents the first complete architectural overhaul of the company's processor line since 1995, when the original Pentium emerged."

    Erm. I've programmed for z80, 68K, Arm, C80-MP, H8, PPC, Axp, Sparc, HP-PA and the ubiquitous x86 (all varieties).

    If the Pentium is a "complete architectural overhaul", then what the blazes does one call the Vax->Axp change, or the 68K->PPC change, or the C80->C6000 change?

    Some people live in very sheltered worlds, evidently.

    FatPhil
  • Whatever happened to RISC?

    Intel happened. Its a shame really. The x86 architecture is all a bit of a mess. I don't think the 386 designers expected it to last until dual pipelined 1GHz chips.
  • IA-64 has 1/5 the performance of an alpha under gcc, which is not optimised for the alpha

    How do you know this as a fact if IA-64 is still vaporware? Or, as the linux gnomes usually do, are you just pulling this "fact" out of a hat?

  • Considering the P4 only does a single proc config...
  • Probably less than a few percent. Consider the fact that almost all business machines are uni-proc, all consumer machines (except the DualG4) are uni-proc, the only SMP machines you have are (some) servers and workstation machines. The midrange-and-up (where you fund the bulk of multi-processor machines) and the server market are pretty tiny (in terms of machines shipped) than the consumer and business markets (I think the statistic is in the hundreds of millions of consumer machines shipped per year.)
  • I heard the "Burst" part refers to the little puff of vapor that you get with each press release.

    --
  • Your only thinking about the desktops.

    Server space is a completely different matter... Most machines there are SMP there. However, Intel currently doesn't have to worry about competition there because AMD hasn't yet released a SMP capable chipset.
  • Perhaps those extremely clever chaps stumbled onto the old hack which played music on the disc drive. I'm sure that's part of it:

    SetKey Am
    SetKey G
    etc..

    Then again it could be these are for flashing subliminal messages. I'm not sure if this is possible. Strange how every time I come to work and stare at this Trinitron monitor I have the greatest urge to buy another Walkman, to go with the other 587...

    Vote [dragonswest.com] Naked 2000
  • by Anonymous Coward
    Simple, you don't quad pump. The so-called quad pumped SRAM I have seen have 2 busses: one for reads and one for writes, or if you prefer one input and one output pin per data bit.

    This avoids bus turnaround times and there is AFAIR a single address bus which is running slower since the data is transferred mostly by bursts on one or both of the data buses. As long as you can place 2 address/command cycles in the duration of a data burst, you are limited by the bandwidth of the data busses.

    Actually this only doubles the bandwidth for some kinds of workloads but not for others: if you are performing statistics computation on a large set of data, you only read from memory and it becomes equivalent to a DDR bus. I suspect that this is what would happen to my workloads: analyzis of large amount of data to extract faint signals from noise with a read to write ratio between 1000 and 1000000.

    It would be silly to quad-pump a bus in frequency: if you can afford x million of transitions per pin on the data bus, halving the number of transitions on the clock line(s) would not simplify the design nor measurably reduce the noise and radiated energy.
  • by be-fan ( 61476 ) on Monday August 21, 2000 @05:37AM (#840797)
    Is it just me, or is the name not necessarily just superflous?

    1) The P4 has very long pipes.
    2) The P4 has small caches.
    3) The P4 has huge bus bandwidth.
    4) The regular FPU has been largely depreciated in favor of SSE2.

    What does all this add up to? A chip to accelerate 3D. This feature list reads largely like the list of the Playstation 2. (Aside from the long pipelines thing.) You've got the small caches, high bandwidth, and the vector pipes. My guess is that Intel, seeing NVIDIA cramming more and more into the GPU, is trying to come back and troughly blow them out of the water. This chip might process slower per clock for many uses, but the high clock makes up for that. On the otherhand, things that are extermely regular without any branches (ahem, 3D geometry processing) will absolutely fly through this thing.
  • by eagl ( 86459 )
    "Who can?" Intel can. However it won't be designed or priced for the normal consumer/user, just like in the past with the PPro and Xeon. The chipsets for those CPUs were designed with stability in mind, not mind boggling quake performance or overclocking, and the CPUs themselves were priced out of reach for most people. That is how Intel approaches multiprocessor systems, and I highly doubt that Intel will change that until/unless Microsoft supports multiple processors in it's basic consumer OS.

    Like it or not, Intel is going to design for the market and the market isn't driven by YOU, it's driven by your next door neighbor who hasn't even heard of Slashdot.
  • How about a filter that tracks referrals from Slashdot and bounces them beyond a certain load level ?

    I've no idea if that's what happening, but it's something I'd want to have on hand, if I ran a site like Anand that was regularly whacked with Slashdot's million typoing monkeys.

  • Whatever happened to RISC?

    Well it's not technically a pure CISC chip either since all instructions are internally translated to microcode. So adding 144 new instructions merely means adding more capability to the translation unit IIRC rather than changing the whole thing...

    ---
    Jon E. Erikson

  • so it'll be as much faster as when I put the 387 into my 386?

    da w00t.
  • I want to be able to decode 3 P0rn DVD's at the same time! One ALWAYS NEEDS MORE SPEED!
  • We all know that RISC processors (i.e. Alpha) are faster and better than CISC processors (i.e. x86)

    Not true, not true. The x86 processor family provides the highest SpecINT rating and has for some time now. We can argue about benchmarks all day, but Spec is the industry standard.

    Just because something is RISC doesn't automatically make it the Holy Grail. There are a lot of factors that go into processor design.

    Which brings me to another point about the P4: unless there's been some grand breakthrough in branch prediction of which I am unaware, I don't buy the trace cache as being anything more than a predecode cache (which, granted, on an x86 could be a big deal).

    --

  • I mentioned server space. However, server space is a lot smaller. Think of this. A business may buy 100 desktops for a workgroup, and only two or three servers for that workgroup. Most of the time servers are expense, low volume items. At home, there are no servers, and the home market is huge. So in terms of percentage of total machines that are SMP, it is a very small percentage.
  • Speaking of The Reg, has anyone deciphered the (il)logic behind the ordering of their articles on the page? Usually it seems that they push new articles down from the top, but sometimes I see articles further down which I have not seen before. And older stories with previously clicked links occasionally reappear at the top.

    --
  • Ahem.

    P6, anyone? I would think O-O-O execution would be a pretty major overhaul.

    --

  • Um, Branch Prediction is not an "Alpha feature". The IBM 360 had branch prediction in the mid-1960's.
  • But more to the point: Is it really only a few percent of the market? I've just ordered a dual PIII and I selected the chip specifically because I could get SMP support. Does anybody have any statistics on single-versus multiple CPU PIII systems shipped? Is it really only "a few percent"?

    Probably. I mean, in the geek world / university lab environment where I live, SMP is the thing to have (most of our machines are dual P-IIIs, or else smaller machines (mostly Pentiums) built into robots), and I know a lot of people doing dual Celerons or P-II/III. But in the general case, it's not that common, mostly because you have to run a real OS to be able to actually use it.

    Though it seems like this, coupled with the AMD 770 chipset, coming out RSN, is basically handing the high-end market to AMD. I mean, Merced isn't going to cut it (assuming it's ever released, that is), and if Thunderbird and Sledgehammer can do SMP but you can't do SMP on Intel unless you use (what will be) outdated P-IIIs, few are going to choose Intel.

    I suppose their logic is that they don't want the P4 to compete with IA64 in the high end space, but it seems likely that the P4 will be _faster_ than Merced anyway. Intel is on crack.
  • Okay, I've looked up some of the statitics. First, even large server manufacters like Dell say that servers account for only about 14% of their sales. If you look at the sales of WindowsNT (which has 38% of the server market) IDC pegs it at about 2.1 million units last year. (The report is from july). Do that math, and that means about 5.1 million server OSs shipped last year. This is a pretty good indication of the number of servers shipped last year. Now, take into account that IDC says 112.5 million PCs were shipped last year, and the fact that not all servers are SMP machines, you can easily see that hte SMP market IS just a few percent of the computing market.
  • We all know that RISC processors (i.e. Alpha) are faster and better than CISC processors (i.e. x86)

    Not true, not true. The x86 processor family provides the highest SpecINT rating and has for some time now. We can argue about benchmarks all day, but Spec is the industry standard.

    I noticed you left out SpecFP... (could it be because the IA32 is absolutely *horrible* at FP operations; not to mention it's low performance most other units that are common in the CPU nowadays) Adding 144 new instructions to the CPU isn't going to help them in the long run either (see: MMX, KNI, 3d-Now!, etc). Anyone and their dog can make a fast integer unit; most operations inside them run at a single clock, and it is the most well-understood unit (arguably) in the history of CPUs; it's just Intel's brute force that keeps the IA32 performing so well.

    The wheel is turning but the hamster is dead.

  • The x86 IA was, is and will remain a CISC design.

    Besides, these are extensions for SIMD and perhaps even vector stuff. Even if it was a RISC design they'd have to add instructions for such radically new features.
  • is why this marketologs refused to call it "2GHz processor" because of this "Rapid Execution Engine"?
    ---
    Every secretary using MSWord wastes enough resources
  • To have MMIs is not anti-RISC.
    RISC is more an instruction decoding/orthogonality issue than an instruction set richness issue. Many RISCs have far richer instructions than CISCs!

    I remember one (thanks to those large foreheaded types in Texas) which had _every_ possible bitwise logical operation available.
    So x86 gives you AND, OR, and XOR, big deal.
    This had NAND, NOR, IMP, NIMP, RIMP, NRIMP, ...

    Back onto
    If you have two spare bits in your opcode, then you could use those to implement
    00 = act as 64bit words
    01 = act as 2 32bit words
    10 = act as 4 16bit words
    11 = act as 8 8bit words
    et voila! every instruction can be turned into a "MMI" instruction!

    (OK, in reality you'd only need the arithmetic ones to have this feature.,
    e.g. with 3 bits
    0xx = MMI arithmetic instructions as above
    100 = logical operations
    100 = control flow
    101 = moves
    111 = something else

    Nothing non-RISC about this at all. (I assume all operations are (R1, R2) -> R3 type or similar).

    Of course, it's perfectly possible to throw orthogonality and symmetry out of the window and implement this as a complete dogs breakfast too! Intel would never do that I'm sure.

    FatPhil
  • Here's your quote: "The chip [...] represents the first complete architectural overhaul of the company's processor line since 1995, when the original Pentium emerged."

    (my emphasis)

    Assuming "the company" referred to is Intel, then this statement is completely true.
    --
  • The Internet is going from a text kind of thing to something more visual
    I don't know about the rest of you but i don't have enough bandwidth for the text based internet as it is.(It really suck living at the end of a copper line, Max = 26.4 kbps)

    In addition, the Pentium 4 will contain a 20-stage pipeline. The pipeline is a processor's assembly line. While this means the Pentium 4 will have a line twice the length of the 10-stage Pentium III, the longer pipeline will create room for speeding up the chip.
    Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.

    It will contain 42 million transistors, compared with 28 million for the Pentium III.
    Even with a smaller feature size won't this create a lot of heat, especially running at 1.4Ghz? IANAExpert but since PIII's run at 90C can we expect this CPU to run ultra hot as well?

    Those who will not reason, are bigots, those who cannot, are fools, and those who dare not, are slaves. --George Gordon Noel Byron (1788-1824), [Lord Byron]

  • Um, I don't know if you're serious, but stuff like this works perfectly fine on earlier CPUs too, of course. ;^) I probably shouldn't be doing this, but Cycore [cycore.com] have some neat tech for doing this. According to their download page, the plugin for their technology (Cult 3D) is available for Linux as well as the Other OS...
  • SHL by 1 can be achieved by adding a number to itself. And adder is just a pile of XORs and ANDs. We would need a separate load const for INC and DEC. SHR I'm not so sure about. Damn! This chip is getting complex.
  • But remember, you're talking about the x86 here...
    We all know that RISC processors (i.e. Alpha) are faster and better than CISC processors (i.e. x86), but that hasn't stopped Intel yet...

    -- Sig (120 chars) --
    Your friendly neighborhood mIRC scripter.
  • ...on a 64-bit bus...
    ...which will support dual RDRAM channels.
    Last time I checked, RDRAM only uses a 16-bit [data] bus.

    -- Sig (120 chars) --
    Your friendly neighborhood mIRC scripter.
  • i'm actually plesantly surprized... looks as if intel put alot of planning and time into this. as soon as there is a p4 sdram option i'll look into picking up one of these suckers...
  • AMD's bus for the Athlon family is the Alpha bus - one that has been proven, and doesn't have those nasty Intel licensing problems.

    --
  • Of course I left out FP. Everyone knows x86 has a horrible FP implementation. It's not even worth comparing because it's so crippled.

    This does not invalidate what I stated. CISC does not require a braindead FP stack implementation.

    If it's just Intel's brute force keeping IA32 on top, then may I humbly suggest the other manufacturers hire away some of that brute?

    I'm not sure what other units you're referring to. Memory, perhaps? A lot of that has more to do with the PC platform than anything else -- consumer-level machines don't require huge levels of memory bandwidth and scaling.

    Keep in mind that SpecINT measures the entire processor performance runnning (mostly) integer applications. This includes memory and control flow.

    Could x86 run faster if it weren't burdened with a huge decoder? Probably. Branch mispredicts are a big problem.

    My point is that you can't judge a processor by its packaging (marketing). It's the guts and the bottom line that count. Speculations based on anecdotes and (in this case) dogma are next to useless.

    --

  • Another processor from intel? Now damn't I just
    gave them a bunch of cash for the PIII, just
    like I did the PII, and just like the Pentium and
    the Pro version.

    I didn't really notice a big jump in performance
    on the last buy, but what can I do...it is intel.


    Why is this so difficult for people to understand? Unless you are doing something really hardcore, like lots of video work or heavy numerical analysis, you're not going to notice any performance benefit. Additionally, we've reached the point where rethinking or rewriting can pay off much, much more than incremental processor speed upgrades. For example, Borland's Object Pascal compiles 10-100x faster than gcc. If you use it, then you're getting an order of magnitude increase. Compare that to the benefit gained by going from a 400MHz Pentium II to 1GHz Pentium III (less than 3x).
  • Then what do you suggest, That Intel and AMD simply pat each other on the back and close up shop? "Yep, we made it to a gigahertz, and no one needs anything faster than that so we can convert all of those fabs into hydroponic potato farms..." ;-)

    Every next core that the two of them come out with has a few neat tweaks inside, but they are minor. It's pretty hard to come up with an earthshattering new archiecture every few months.

    If you feel pressured to buy a new PC every year, get a different hobby, or get counseling.

    Since when did people accept, without question, WITHOUT QUESTION, that they need to spend $2k on a new PC every 1-2 years!? To me, that is absurd. When did the brainwashing happen? How did we all miss it without doing anything?

    Or is this just what it's like to live in on an exponential curve?


    ---
    Unto the land of the dead shalt thou be sent at last.
    Surely thou shalt repent of thy cunning.
  • Hell yeah! I second the motion. All they do is dis big companies, call everything hype, but never have much to say that is informative. They are like the Enquirer of Technology. Occasionally, getting info quicker than the mainstream, but usually their articles are trivial, mindless bullsheet.

    JOhn
  • The P4 is worse than I'd expected.
    • All that architectural redesign, and it does less per clock. Ouch. Was it worth that much to get bragging rights for a higher clock rate? Remember the DEC Alpha? Huge clock rates, but not much got done per clock. Same idea.
    • It doesn't support multiprocessing. That cuts out the high-end market.
    • It's RAMBUS-only. That cuts out the low-end market.
    • The "Netburst" name is really stupid.

    On the other hand, note that Merced/Itanium/IA64/whatever seems to have gone away. The Register now points out that IA32 is 2x faster than IA64. Oops. Probably just as well; Merced was hell to program. VLIW architectures require miracles from the compiler.

    Besides, even the 1GHz PIII is mostly vaporware. Try to get one. Yes, they exist, but there aren't many of them.

  • Don't forget Windows 98 to provide faster, safer and more convienient access to the internet! :)
  • >First, even large server manufacters like Dell say that servers account for only about 14% of their sales.

    Correct me if i'm wrong, but i wouldnt consider Dell a "large server mfg" for purposes of demonstrating that SMP is a small percentage of servers. Look to the real server mfgs, IBM, Sun, HP, Compaq, i'm willing to bet you will see a larger percentage of SMP systems. When thinking of servers, dont limit yourself to x86 hardware.
  • If the Pentium is a "complete architectural overhaul", then what the blazes does one call the Vax->Axp change, or the 68K->PPC change, or the C80->C6000 change?

    It seems that way, but in practice, modern x86 CPUs (Pentii, Athlons, Crusoen) are just the x86 instruction set wrapped around something which is not like an 8086....

  • Ah ha ha. That's funny. :P

    First of all, the Vax->Axp and the 68K->PPC change happened before 1995, and second of all, the VAX is Digital, and the 68K/PPC is Motorola/IBM - not Intel.

    You do live in a very sheltered world indeed, FatPhil.
  • Score 3: Funny?

    I was serious! (CV available on request if you don't believe me.)

    OK, downmod to redundant this one now :-)

    FatPhil
  • This article is just marketing bullshit, the guy who wrote it obviously has no clue what he is talking about. Take the crap he writes about "peer networking such as Napster" as another example.
  • You see, MHz counts just aren't enough any more. You also need a completely insane number of transistors (42 million!!! they're nuts!) and about as many meaningless buzzwords. This article *is* marketing BS, and the author doesn't know a thing about processors or any other topic he writes about, that's for sure.
  • You can do jumps by adding the program counter. You can also do conditional branches with NAND and and ADD. Risc is beautiful!
  • You're flamebait. I'm weak.

    Vax is digital. Axp is digital.
    Vax->Axp was a "complete architectural overhaul" performed by Digital.

    68K is motorola. PPC is part Motorola.
    68K->PPC was a "complete architectural overhaul" performed by Motorola in cooperation with others.

    If you aren't flaimbait, then go reread my post, and see if you can work out what I may not have said clearly enough first time. (In which case my bad.)

    FatPhil
  • by Cylix ( 55374 )
    Another processor from intel? Now damn't I just
    gave them a bunch of cash for the PIII, just
    like I did the PII, and just like the Pentium and
    the Pro version.

    I didn't really notice a big jump in performance
    on the last buy, but what can I do...it is intel.

    Of course most of us really don't need this new
    processor...but we are going to buy it anyway. We
    are going to do what Intel tells us is the right
    thing to do...what we are told we need to do.
    Where we fall short...Intel will tell the OEM's
    about how they need to switch to this new
    processor...because all of the older and cheaper
    ones are really tough to make now. It is really
    tough to make the older processors because they
    don't make as much from them as they do the new.

    Granted my views are a bit twisted, but
    one has to admit there is part of a truth
    here.

  • Are you sure that it doesn't have any nasty DEC/Compaq licensing problems, though? Or could Cyrix, IBM, MIPS, and Sun all start building processors for EV6 without even going so far as to mention it to Compaq?
  • by Pulzar ( 81031 ) on Monday August 21, 2000 @03:43AM (#840838)

    Could someone explain to me how having a longer pipeline speeds things up? this seems kinda counter intuative to me. Guess its like the pipelines in the 3D GPUs, but i don't see how that would work in a general purpose CPU.

    The longer the pipeline is, the smaller each stage (of the pipeline) is. The smaller the stages are, the higher the frequencey you can run them on is. If you cut each of the existing stages exactly down the middle, you could run your CPU on twice the frequency, without making any other changes! (Of course, you can never cut a stage exactly in half, so you'll never reach 2x increase).

    Why don't we make 10,000-stage pipelines, then, you might ask :). In the ideal world, a completed instruction "comes out" of the pipeline at each clock cycle, so with 2x frequency, your cpu is twice as fast. The problem is, with a huge pipeline, you increase the chance that the instruction will "stall" along the way, and you'll get less than 1 instruction (on average) coming out on each clock cycle (the "IPC" thing the article talks about). If you add enough stalls to your pipeline, your might effectively decrease your CPU's performance.

  • Sorry, I'm far from a RISC purist, but we have an issue:

    1) Functionality costs chip real estate

    2) Speed costs chip real-estate

    If we start doing the graphics and the sound on the CPU, great for a notebook, but isn't that processor going to perform wose than an equivalent lower functionality chip? I do not object to buying a clever sound card or graphics card should I need one. If I don't, then I can live with something more basic and leave the CPU to do what it does best.

    Why compete with peripheral vendors, unless you are really going for the single-chip that does everything market?
  • However, I was talking about what a "complete architectural overhaul" is. Not what it means to solely Intel, but what it means when looking at the broader history of microprocessor architectures.

    In fact, the misunderstanding _is_ the essence of my point.

    As in "Quake 3 Arena is the first new genre of game from ID since 1998." To which I'd reply "new genre?"

    FatPhil
  • The Register [theregister.co.uk] has a nice anti-hype article [theregister.co.uk] about the P4.

    My favourite is

    There are two key words and phrases you, our readers must note. First of all, the Pentium 4 marchitecture is now to be described as Netburst, and the second phrase is that this architecture should be described as the repeated engineer execution (REE). We know what REE stands for but we prefer our version.
  • Laptops users don't quite have that option! SWAR is a tad helpful in that respect.

    I do agree with the idea that on chip 3D acceleration would not help too much on a PC with the latest 3D card.

    I don't think that streaming media is accelerated by the video card. That would be helpful to PC users. I guess Intel is waiting util broadband cable access hits almost every home and streaming media hits it big.
  • something is borked.

    Thank you for answering my question, of whether anandtech was dead, or if my employer had attempted a transparent proxy and failed.
    ----------------------------
  • I don't know, but I saw it as well, cut & pasted the address and that didn't work either...
  • by stx23 ( 14942 ) on Monday August 21, 2000 @01:58AM (#840845) Homepage Journal
    Were you actually planning on reading the article before speculating wildly?
    You must be new round these parts...
  • I always read the articles first - don't you? :)
    --
  • Well you might buy it, but my next chip purchase is a Athlon (probably mustang by next upgrade), have fun wastign your money with Intel...
  • AMD received cross licensing that netted them the EV6 bus other companies would have the normal amount of trouble with licensing if they used it though...
  • by karmma ( 105156 )
    Access denied to system because of URL Filter Configuration, while attempting to retrieve the URL: http://www.anandtech.com/showdoc.html?i=1301.

    Hmmm... I wonder if their webserver is running on a Pentium 4?

  • Its very b0rked, yes. I got the same error message.


    --
  • Yep, I got the same thing ... must be a bogus message from their server. I'm using a dial up connection with no filtering. Probably their server bucking under the load.
  • ...that would be a Red Hat.
  • I'm really surprised about this pentium 4 chip.

    Didn't Andy Grove write a book called "only
    the paranoid survive"? I guess the paranoia
    is over and pride is calling the shots:

    to wit:

    1.) dual RDRAM channels? RDRAM? McFly? Hello?

    RDRAM is dead. I don't care if it's the same
    price as SDRAM. Nobody in their right mind
    is going to commit to rambus based PC's. Intel
    and rambus tried to force a new standard down
    the collecive throat of the industry. Just
    like the old IBM and the microchannel it has
    failed. Just like IBM of old this attempt
    will come back to haunt it for a long time.

    2.) No SMP? You must be kidding. Intel's SMP
    architecture was the only thing separating
    it from AMD. Now intel is producing chips
    that are incapable of SMP? puh-leeze. Does
    the word scalability mean anything to these
    people? If pentium iv ships with any less
    than 4-way SMP it will be a critical error.

    3.) More cpu instructions? How many instructions
    does the ia32 architecture support now? 500?
    And what are these new instructions going to
    buy me? VR on ebay? come on. If I want high
    performnce 3d i'm going to buy a video card
    not a franken-cpu. Intel is going off the
    deep end with their bizarre and shallow
    marketing. Maybe they can sell these chips
    to the 10 people on their Web Outfitter
    service. A sucker is born every minute but
    they aren't the consumers of high performance
    CPUS.

    In the end we have intel, formerly one of the most
    entrepreneurial and forward thinking companies
    of our generation being replaced by a prideful
    monopolist that believes the market will buy
    anything that it produces.

    Intel has the most aggressive competition of
    the last 15 years to deal with and they are
    doing little more than putting their heads in
    the sand. I predict they will lose 20-40% of
    their market share withing three years if mistakes
    like this one (and the mistakes-in-waiting of
    the ia-64) come to pass.

    AMD and Transmeta are going to use
    this fiasco to pierce intel's armor.

    how many OEMs will they be able to steal away?
    Do you think Dell and compaq want to sell
    high performance computers with no SMP support
    and overpriced RDRAM? They know the market will
    not forgive them if they try.

    Intel might be king of the hill today but
    in the end goliath is about to be cut down
    with his own tools.

    Where have you gone Andy Grove? your ship
    is sinking.

    --chuck
  • Replace AND, OR and NOT with NAND.
  • by cybaea ( 79975 )

    The article says [anandtech.com]:

    The 432-pin Pentium 4 should dissipate around 52W of heat when operating at launch speeds; this puts it below that of the 1GHz Thunderbird that is currently available.
  • by GavK ( 58709 )
    Maybe they're still working on a click-through NDA...
  • The 14% isn't SMP, it's how much of Dell's sales are from servesr. I'm assuming that a large percentage of Dell's servers are SMP machines. Dell is a pretty big player in the server market, so I'm guessing that a larger percentage of their sales are from servers (and workstations) however, even for them, servers account for only 14% of sales. And I wasn't limiting myself to x86 machines. The 5.x million server operating systems include WindowsNT, Linux, and other UNIXs.
  • From the article: [anandtech.com]

    The P4's bus, unlike the Athlon's EV6, isn't a Point-to-Point bus, meaning that all CPUs must share the same 3.2GB/s of available system bandwidth. With a Point-to-Point bus, although it's more complicated to implement, each CPU in a multiprocessor environment gets its own connection to the North Bridge ...

    IANACD (I am not a chip designer), but this seems to me like a major disadvantage compared with the Athlon. Am I missing something obvious?

  • by Jon Erikson ( 198204 ) on Monday August 21, 2000 @02:11AM (#840889)

    Try this link [cnet.com] at CNET for more information.

    ---
    Jon E. Erikson

egrep patterns are full regular expressions; it uses a fast deterministic algorithm that sometimes needs exponential space. -- unix manuals

Working...