Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Intel

Dual Pentium III Xeon Review 71

Sander Sassen writes: "Intel has recently released its new line of Pentium III Xeon CPUs, based on their new .18 micron process. HardwareCentral takes a look at its performance, utilizing a dual CPU configuration on an Intel i840 platform with 256 MB of Rambus memory as a testbed. This Dual Pentium III Xeon review has all the details of their findings."
This discussion has been archived. No new comments can be posted.

Dual Pentium III Xeon Review

Comments Filter:
  • by Anonymous Coward
    "Multiprocessin won't be available until the AMD 760 chipset"

    youre wrong, the 760 chipset only supports DDR RAM and a 266mhz FSB (133mhz DDR), its the 770 chipset thats gonna support SMP...
  • by Anonymous Coward
    Ya gotta love it. buy a $69k 8 proc xeon now, and they'll throw in a free palm pilot... !
  • by Anonymous Coward
    That's why its "good stuff".
  • I guess it takes all kinds. I'll be impressed, for instance, when Motorola manages to create a chip that runs at the same clock speed as the fastest x86 processors currently do. Call me back when the G4 is at 1GHz...

    - A.P.
    --


    "One World, one Web, one Program" - Microsoft promotional ad

  • Dual PIII Xeons and RAMBUS...

    Only one meeeeelyun dollars.

  • It me, embobo. Please prevent the use (utilization) of the word "utilizing" for the rest of eternity. I'll give you a crufix-shaped cookie if you do. Don't make me unleash my core competencies or my skill set upon you.

  • http://samovarawards.com [samovarawards.com]

    "Revenge of the year" award

    All the last couple of years, AMD was fighting against cheapest chipzilla's Celerons, eating bits after big Intel Pentium III lunch. Well, today is the day to revenge.

    First trick was to involve Intel into crazy run for the first Ghz CPU. Having complicated two-year design cycle and expensive factory-replication pattern, Chipzilla is too big to move fast. Playing by AMD rules, it lost much of its production power to bad yield. Announcing 1 Ghz chip, Intel is practically shipping 850 Mhz parts at the best, and only 550-800 Mhz in volume. So the whole 800 Mhz to 1 Ghz market is at AMD control. And now follows the next AMD play - price reductions. So, the most powerful Intel chip will compete... against $324 800 Mhz AMD part? The whole $430 billion company is moving into sub - $350 market? Very bloody...

    And what was the Intel game? All the last year it was drunk - dancing with Rambus, screwing up one mobo chipset after another. Today earnings report shows Intel's revenues falling. Maybe its a good time to open another bottle?

  • Thanx to Hardware Central for yet another breathless description of the latest, greatest data point corroborating a trend I have been tracking since the original Pentium: Performance tracks clock speed!

    All of the extra transistors that Intel keeps packing onto the chip accomplish nothing more than to compensate for the various nonlinear elements in the system (eg, RAM and HD). The data I have collected over the past several years (back to the original Pentium) show that (as far as benchmarks are concerned) Intel's newest Pentium architecture is no more efficient (in CPI terms) than its oldest.

    In fact, Hardware Central's own benchmark shows that the new PIII Xeon is the least efficient performer in their group, despite having the highest overall performance and the snazziest new architecture. Viz:

    CPU -------- Perf rating points / aggregate MHz
    PIII Xeon -- 2.6
    PIII 500 --- 2.7
    PPro 200 --- 2.7
    Athlon ----- 2.8
    Cel 366 ---- 2.7

    They could achieve the same performance shown by this dual 666 PIII by pushing an original P60 up to 1332 MHz (assuming ideal scaling, which you essentially get with synthetic benchmarks).

    Every time I read a review wherein the writer wets himself over the "blistering speed" of Intel's newest architecture, I plot another data point on my straight-line graph, shake my head, and mutter a curse about the quality of technical education in America.

  • by Azog ( 20907 )
    Oh yes! There is something to look forward to - the return of EMM386! (groan).

    But they will have to call it EMM_IA64 - the new, old way for applications and the OS to get past that 1MB^H^H 4GB barrier.

    The more things change, the more they stay the same.
    Torrey Hoffman (Azog)
  • This is the kind of processors everybody wants to have, but no one wants to pay for.

    -henrik
  • The PowerPC 750 runs two 32 k 8-way set assoc L1 caches (data and instruction), and a 64 entry, 16 set, 4-way set assoc branch target instruction cache.
    The L2 control provided by the 750 is implemented with an on-chip 2-way set assoc. tag memory.

    Of course, you can always use a 740 and provide your own L2 cache control... and I don't know exactly what Apple does, but the L1 remains the same.
  • Well, one should be able to glom some of the code from a dual-alpha implementation. The bus is of that design, and the arbitration (not really a bus, it's supposed to be a switch) is ~the same. Not an exact copy, of course, but it could be a good lead for some of the framework...
  • oh wait.... one hundred beeeeelyun dollars!
  • Actually what 'nester' is saying is true.. most modern 'RISC' machines have 'complex' operations as well. It doesn't usually take more 'instructions' to perform the same operation. The x86 on the other hand has single byte complex instructions such as MOVSD which does a memory to memory copy (there isn't a mov
    mem,mem) with the 1010010w opcode. For example the PPC also has string move instructions although I don't have my PPC reference handy to list the opcode but they take up 32bits instead of 8. Actually the MOVSD takes up 2 bytes because you almost always want the REPNZ prefix. This is one of the problems with decoding the stream. The instructions are 'variable' length as well as different sizes. On the other hand the complex instructions in most 'RISC' architectures are going to cause those architectures to need translation layers to more simplified micro RISC OOO engines for a large number of their complex instructions as their clock speeds scale.
  • The code size thing is just an obfuscation for the real issue which is optimal cache size -vs- associatively -vs- latency is a function of the application being used. An application like quake or word which can hold the vast majority of the active working set of its data structures and primary code paths in less than 256k doesn't
    need more than that so any speed improvements you get in the cache access times pay off big. Note on the other hand that intel went from a 512k two way set associative to a 256k 4 way set associative vs maybe a 1 meg direct mapped. In theory all three should provide roughly the same hit rate. A workstation or server on the other hand has an entirely different cache footprint. Intel engineers are smart, intel marking is not.. Remember the PPRO vs PII issue? The PPRO supported large memory sizes, fast caches, and >2 way SMP the original PII did not. So until the xeon came out there was a small group of high end server/workstation customers that were pissed off because they couldn't get newer processors from intel to replace their servers.

    Motorola caches are a funny thing. They don't offer on die (that i've seen) instead they offer on die tagging. The L2 tagging is 2way set associative and supports 512k,1m and 2m external caches. This actually sounds more like a case of motorola trying to squeeze all the performance out of an old cpu core while still making them cheap with the idea that all the performance numbers will be listed with 2megs external L2 (to make up for the fact that the latency is going to suck being an external cache) while everyone will sell cheap little 512k versions.
  • The only way I can see a virus trying to use it is as a method of hiding the bulk of its code.

    Reminds me of the first polymorphic pseudovirii with the code segments that appeared benign, but assembled into final executables all with different signatures to avoid detection.

    The main problem with using the chip is that only some of the potential targets will be infectable. This might be cool if it's a viral spreader, which seeds target systems via HTTP or other requests. Use one method to deliver the code to the big servers, then use the servers to deliver the target virii to get all the victims. That way the server can avoid having the code that it delivers, since it just needs to pass on the request to deliver the final package. You could even set up a multi-tier distributed approach, with host virii, server virii, and delivery virii - all of which can adapt to different OS and anti-viral protections by using different delivery methods and different trigger events suited to the target OS.

    One ring to rule them all
    Three rings to serve them all
    Five rings to infect the different OS
    Seven rings to make Bill G's day

    Just a thought ...

  • I have one word to add - Thunderbird.

    I have one word to add - Vaporware.

  • Agreed. You could just go to the chip PR websites and get this information.

    What would be news if some superfast chip, or modified to be fast, was produced that was cheap. Like the earlier Celerons were.
  • I read somewhere before that a dual Athlon would be possible. Does anyone know if there exists mainboards that will support this?

  • It's a learned skill, that unfortunately I lack. It would have been extremely useful yesterday when I installed a program that, when run, turns your numlock on and hitting the numlock key won't turn it off. I didn't realize it was the program's fault, and was screwing around trying to fix the problem. On the laptop, the number pad is on top of the regular letters. Needless to say, my ICQ friends got responses that made no sense, and if I knew how to write like a script kiddie, I would've been able to get my point across.
  • As far as I know, Intel's x86 chips use APIC, and Athlons use OpenPIC.

    OpenPIC is by no means "new". It's been around for a while. Alphas have been using OpenPIC for quite some time.

    Bringing SMP Athlon support to OSes shouldn't be that difficult, it just needs to be done.
  • One simple reason for these processors (and why someone I know will pay for them)....Video
    If you want to deal with uncompressed PAL video footage, 256k is not enough, 512k is not enough and even 1Mb is not enough unless you are willing to drop down to DV or broadacst values during the editing stage (in fact everything currently drops down from true uncompressed values using 4:2:2 and the like instead of a full 32bit/pixel ratio).
    What's this mean? Well if you want to make a TV program on your PC for a station with any standards (i.e. not the jokers who will actually put commercial video camera footage up) you _MUST_ have at least 1Mb of cache per processor or suffer the consequence of each frame requiring firing in and out of the chip cache.
  • One problem, I was talking about editing (and effects/transitions etc). So unless you want to stick with buying a high-end card and only using the inbuilt effects/transitions the card does little for you. The person I was refering to who would buy dual Xeons with 1Mb+ uses a DPS Perception (I think he has a RT aswell now) which handles the encoding, decoding and non-linerar access, but it is his software that renders out large sections of his work (all non-hardware based transitions).
  • A beowulf cluster of these babies.

    (obligatory slashdot commentary.)

  • example: x86 mov will do register load/stores and mem->mem copies, but it can't do all at once

    This still reduces x86 code size. Compare the following memory to memory copy:
    mov [eax], [ebx]
    With:
    mov ecx, [ebx]
    mov [eax], ecx

    The RISC-like (second set) code is twice as long as the CISC-like (first example) code. Granted, execution time is about the same (both processors do the same thing internally), but the code to represent that operation is smaller.

    I'm not talking about pro's and con's of CISC vs. RISC internal to the processor (RISC wins hands down), I'm just saying that usually, CISC requires less instructions to do the same thing. Although, you're right about variable length instructions helping code size and being a pain in the ass internal to the processor.

  • ...I'll just have to settle for a dual-celeron...
  • This is pseudo on-topic, but aren't we moving away from these huge monolithic boxes and into more distributed envirnments? I suppose that isn't good for intel's busines..

  • Alphas do.
  • Look at their <a href="http://www.hardwarecentral.com/hardwarecentr al/reviews/1673/10/">test of dual PIII-1000.</a>

    It shows, that PIII-1000 is 1.5 times faster than dual Xeon-667 on both CPU tests (CPU/FPU and multimedia), but looses on the memory speed test because of Rambu$.
  • --See now this is what I am talking about!

    --I read this post and now my head hurts so much I want get on a subway an kill people.

    --I had a miron once I rubbed some Lambda oil on it and it went away

    --OW! The pain, my eyes bleed...

  • How come the obligatory Beowulf Cluster remark isn't out yet?
  • while a dual Xeon system does get closer to the price of SGI, Sun, etc, it still is significantly cheaper. A few months ago I priced Xeon systems vs SGIs for some engineering work. The Xeons came in much cheaper, although we still went with the SGI for better performance. It is still a price/performance game, and always will be.
  • here is some information:

    Intel's web site is at www.intel.com [intel.com]

    Late.

  • Just fleecing their corporate customers?

    Yes. Typical decision maker has about 1% of the knowledge of microprocessor architecture that anyone posting on this thread has, and about 0.1% available time to think about their decision. So the way they solve the problem: big numbers = good. Note: I said typical, not every.
  • I doubt you'll be able to execute code from the scratch area.

    The only way I can see a virus trying to use it is as a method of hiding the bulk of its code. Just think of a little micro-virus that is even harder to detect than regular viruses (because it is so small it doesn't have much of a signature) and works by loading itself from the EEPROM when it is executed. This might be viable if the virus detection companies don't think to check accesses to memory the same way the check accesses to the hard drive(s). Of course it would be rather difficult to spread this virus, as everyone would need a brand new ultra-expensive processor in their computers, and the people who tend to buy these things tend to know how to avoid getting viruses.

  • You think I'm kidding...:-)
  • Maybe for highly memory intensive long processes RDRAM is worth it, but how many of us will fin that worthwhile?

    At a guess, anyone that needs to use a Xeon as opposed to a regular PIII.

  • If you read the full article it says the processor is only $50 to $100 dollars more then the slot 1/Socket 370 counter part. The big difference is the management functions that are part of the processor housing such as temp., 2 eeproms, and so on... Given the size of the cartrige and the metal back plate I would imagine that it also cools better and for a server that is good.

    Anouther issue is the whole slot two thing. Alot of the i840 motherboards that are in production/planned are slot two making this processor nesicary if you don't want to use a 550MHz processor.

    "... That probably would have sounded more commanding if I wasn't wearing my yummy sushi pajamas..."
    -Buffy Summers
    Goodbye Iowa
  • Well, maybe EMS wasn't as ugly as some other hacks, but it messed up the programs. I never used it on anything less than a 386, and when I got a compiler that could do protected-mode DOS programs, I promptly forgot all about it. And then I installed Linux instead, and got a flat address space (no need to keep the structures below 64K in size - Yay!).

    And yes, XMS was hard to use, but when that's what you had... Again, flat address space rocks :)

    And I can also fill you in on the HMA stuff. The extra memory space come from the segmented memory addressing, with a 16-bit segment and 16-bit offset. The true address is calculated by (segment << 4) + offset, which in general creates a 20-bit address, capable of addressing one meg. However, note the overlapping parts of the segment and offset - if you put the value 0xffff in the segment, and anything greater than 0x000f in the offset, you will overflow the address space on a 20-bit bus, and wrap back to low addresses. What they did was to disable the address wrapping / aliasing, and instead merrily continue up into high memory. Wrap or no wrap was selectable by the "A20 enable" thingy (don't remember exactly how it worked).

  • CISC instructions (fwik) rarely do more than comparable RISC ops. example: x86 mov will do register load/stores and mem->mem copies, but it can't do all at once. the reason x86 code is smaller is due to its variable opcode length. the fact that x86 ops can each do more than one job is just namespace (opcode) compression, and it necessitates microcode and complicates pipelining.
  • With 256KB RAM, this is clearly intended for the Workstation Market.

    Geez, nice workstation! :p

    My nintendo has more than that!
  • What else are they going to do? They are working in other areas: advanced architecture (williamette, itanium) and Mhz (0.18 process). If increasing the cache also gets them a speed boost, however small, they'll do that too. You will always have a set of customers that are screaming for any amount of speed, regardless of the cost. Xeon is for them.

    In addition, I'm sure it doesn't hurt them when comparing against Sparc, Alpha, PowerPC, etc. These all have a ton of cache (8MB!! in the case of high-end Sparcs) and, as we have discussed, more cache implies more speed to most people.

  • AMD is working on the next generation Athlon chipset, known as the AMD 760. Following shortly after this chipset will be the AMD 770, which will be the same as the 760, but with SMP. What I have read [ebnews.com] puts these chipsets coming out next year. Via, Ali, etc may come up with something sooner, but it is doubtful.

    Intel uses a proprietary "standard" for their SMP implementation. This forced AMD, Cyrix, etc. to invent an open standard, OpenPIC(?). OS's have written SMP drives for Intel's standard, but since there has never been an OpenPIC SMP motherboard, there are no drivers for OpenPIC.

    So, before SMP Athlons, you need a chipset, a motherboard, then drivers. Sound like a long, sad road to me. I want one too...

  • once you look at xeons you get into the price range of the good stuff (IBM, Alpha, SUN etc.)

    While true, the "good stuff" doesn't run NT.

  • Warning: long technical post ahead.

    It wasn't an ugly hack at all. It was a way of sending 64k "frames" back/forth over the ISA bus to an add-on memory board. It was SLOW, but not ugly. It supported 32 MB (an arbitrary number IIRC) of RAM on a machine with a 20-bit address space (8088/8086).

    Anyway, 80286 had 24-bit addressing (16 MB) out of the box but few early motherboards supported it (although my PS/2 model 30 is expandable to 16 MB), and besides the cards were a lot cheaper than SIMMs. Plus you could of course only access the full 16 MB from the 286's broken "protected mode".

    EMS was also easy to use from Real Mode. Of course XMS could be used too but it was a lot less clean than the EMS API (software int 0x68?).

    Do not confuse EMS with EMM386. EMM386 emulated "hardware" EMS by providing a layer over top of XMS. It wasn't nearly as slow, and used the same clean API.

    However, circa DOS 5.0, people stopped using EMM386 for EMS. Since EMM386 needed to fake its own adapter space (the original adapter boards used an address like 0xE0000 for the 64k "frames"), it contained code to support UMBs (Upper Memory Blocks). You could use UMBs to allocate unused memory in the adapter space between 640k and 1024k in Real Mode.

    I would consider UMBs an ugly hack (remember MemMaker? LoadHigh? DeviceHigh? InstallHigh?). They're still in use by Win95/98/ME to hold things like the Real Mode mouse driver etc.

    HMA was the ultimate ugly hack. That upped the Real Mode address space from 1024k to 1088k on the 80286 and higher. I have no idea how it works, but you may notice that your Win9x virus scanner looks at a full 1088k of "conventional memory" during boot.
  • Intel has Physical Address Extensions for 36-bit addressing for a limit of 64GB. Win2k supports this in their Server version. I believe Linux support is done or coming soon. I also think SCO supports it. I have no idea about Solaris and various BSDs. Over at Unisys we have a monster system called ES7000 that supports 64GB. It also supports 32 processors and 96 PCI slots.
    IA-64, AKA Merced, AKA Itanium supports full 64-bit addressing for a whopping 16 EB (exabytes!). Microsoft currently claims that Win64 wil only support up to 64 TB, although that may only be in Data Center Server. Anybody know what the other IA-64 projects will support?
  • Yea, but I doubt management features will carry the load of a large web server. I'm saying that if they intend to put this in the normal Xeon market, then they need more cache, or else people are going to continue to use the older proc, or they will get whopped by AMD and its 8meg cache Athlon.
  • The early K6's supported OpenPIC, but it was dropped in the later models.

    The Athlon is an Alpha EV-6 protocol. The multiprocessing is point to point, and will be done with a protocol format, much like a switching hub.

    The first chipset to support it should be the AMD 760, which should be out later this year. By this time, the Thunderbird processor, with full speed cache should be out as well. This will make the Xeon change their price/performance model to compete.

    It's more a northbridge issue than an OS issue. In a point to point protocol, each processor needs its own northbridge chip on the motherboard, which gives each processor full bandwidth to the rest of the system. Once the multiple northbridges are present, implementing multiprocessor support in the OS should be trivial. The code should be essentially identical to supporting a dual P3, except the performance characteristics will completely different because of the full bandwidth of the multiple northbridge chips. A multiple P3 uses a single northbridge chip and GTL bus, which gives diminishing returns on more than a couple of processors.
  • This is probably because the PIII/Aluminummine processors use a fully associative cache, instead of a 4-way or an 8-way associative cache. This basically means that the cache is very good at optimizing itself for the most commonly accessed data, but also means that the cache doesn't scale as well to large amounts. I don't know what the cache association is on the G4 (I could go look, but I'm lazy, so I'll leave that for whoever isn't) but that probably has something to do with this issue.

    I have one word to add - Thunderbird. Full speed cache in good amounts, for about 10% of the price of a Xeon. Also, faster clockspeeds (somewhat relevant in this case) - the Thunderbird 1250 should be out in the next couple of months. Multiprocessing won't be available until the AMD 760 chipset comes out later this year, but if you can wait, I think the AMD Thunderbird is going to be a good choice over the Xeon.
  • Or... you can go with real time hardware encoding, like a Matrox RT2000 or Digisuite card. The encoding is done on the video card so the processor really just has to handle the command stream. One of these type of cards is somewhere in the $1200 range, but that's still less money than a higher range Xeon.

  • Re:Two Words For Ya! (Score:3)
    by garver on 10:16 AM April 18th, 2000 EST (#36)
    (User Info)
    As always, when looking at cache, you compare bang for buck. Adding cache costs money, lots of money sometimes. Some processor architectures get more mileage out of added cache than others.

    For example, the G4 seems to love cache and screams faster and faster as you add it. Apple/Motorola have found the 1MB cache level to be their sweetspot, most bang for buck. On the other hand, the PIII is not as cache loving. Giving it another 0.75MB doesn't do it all that much good, so why waste the money? Their sweetspot seems to be 0.25MB.


    Then why dump so much more onto their high end systems if the performance increase is negligible?
    Just fleecing their corporate customers?
    I still feel that 1mb cache would do a bit towards increasing the performance of x86 processors based machines.

    Kintanon
  • Not impressed.

    Yippy skip, for 6K$ extra I can drop another 1.75 megs of fullspeed cache on a processor. Gee, big surprise, that increases the performance, whoulda thunkit?!
    What I want is for the X86 processor makers to catch up with Motorola and put 1m of full speed cache on their regular processors. I have a hard time finding a processor with 512K of cache, WTF is the problem here?
    This is just IBM slapping the market around to try and increase their profits without actually giving us anything new.

    Kintanon
  • Bah, whatever....
    My typo, replace all instances of IBM with INTEL.
    That'll teach me to PREVIEW the damn posts...

    Kintanon
  • Way to read the article you're commenting on.

    "The Iwill DCA-200 motherboard is a pinnacle of stability and performance, but doesn't come cheap. The same applies to the 256 MB of ECC PC800 RDRAM; it is fast, very fast even, we've never seen memory scores this high, but will set you back considerably."

    AND

    "RDRAM finally showed some of its muscle here, with the highest memory throughput we've ever seen on any memory architecture. The dual RDRAM channels on the i840 chipset really show off its benefits and low latency."

    Say it with the group "low latency"

    Oh, and about Tom - when you have cancer, do you go to a systems engineer? Then why do you go to an MD for your tech information? Pabst has had some good info over the past several years, but he's also had some very questionable conclusions, and he has been getting more, shall we say, touchy, since the video benchmark fiasco.
  • I'll agree that it isn't the most useful set of benchmarks, but I disagree with your server only comment.

    The Xeon has been marketted as a Workstation/Server chip, and has seen it's way into the SGI NT workstations, etc. With 256KB RAM, this is clearly intended for the Workstation Market. The wider cache bus and the new motherboard are nice additions for the workstation market, but I think that the server market would give up some MHz for the larger caches of the "old" Xeons or get the "new" Xeon in the 512KB, 1M, or 2M (if available?) versions. I mean, 256KB of L2 cache is going to be useless in a large database server, as you'll never make a cache hit, while a larger cache is useful if most of the accesses are within a general range.

    However, I agree that this was mostly a stupid review. Testing it against obviously inferior hardware wasn't interesting. I mean, testing it against dual 800 MHz P3s or 1GHz P3s would give an understanding as to what the new cache system does. Testing it against processors from the same family at 2/3s the speed and shouting, wow, it's fast, is kinda silly.

    Alex
  • You are right, 256 MB is a little weak. My personal computer has 384 MB of RAM... The motherboard they used for the test was an Iwill DCA200 [iwillusa.com]. This board will support up to 2 GB of RAM. I think the reason that they only used 256 MB was because that much RDRAM memory runs about $1,100. Peguin Computing has an 8-way Xeon system [penguincomputing.com] that will support up to 32 GB of ECC SDRAM memory. I am sure there are other x86 based machines like this, but I don't know of any off hand.
  • .18u my rosy red arse! For those of you not in the know, the .18u measure is the smallest feature measure, or the Lambda of the chip. Every other dimension on the chip is a multiple of that number. It is the distance across the gate of a transistor from source to drain. Now, when they bake the chips, that distance shortens by a few mirons. Unfortunately, the marketing dept. got wind of this and took off with it. Now, they measure the shortest distance from source to drain right near the gate, because the further from the gate the measurement is taken, the wider the gap is. (Sort of a curve...) So in reality, those .18u chips are actually .20u or .21u. It doesn't sound like much, but when you're talking about millions and millions of transistors, that's a lot of space. (But probably still no more than the head of a pin.)

  • Yeah, no one would be able to beat you to first post.

  • --I was going to go for the quad setup but I found that two asbestos leg protectors was cost prohibitive.

    --these are the same a Celerons right?

  • Dream on it!

    They just want to give an idea of raw processor performance. What you claim (and I agree with you on the fact an Oracle benchmark would be much more significant to most of us) is a benchmark measuring the overall system performance and no longer just the CPU performance. So, it may not be possible to claim significant performance improvements from such a benchmark, since the result will not depend on sole CPU performance, but rather on the complete disk subsystem performance, memory performance, database tuning, etc.

    Bottom line: You are always on you own when time comes to figure out performance in real life situations.

  • who gives a shit about Dhry stone and Whet stone? i want the Q3Arena benchmarks. mp3s prOn and Q3 are the only thing i use a computer for...
  • Perhaps you are looking in the wrong place. Benchmarking a processor and looking at server performance are two completely unrelated things. If you are looking at a 500GB database there are many other things besides processor performance to look at(unless you are working for one of those server companies you mention and are designing a box, but sounds to me like you are designing a system.) IMHO, hard drive speed, RAID performance, the number of drives in your system, and the amount and type of RAM you have are all somewhat more important things to look at. Bottlenecks in a database system are rarely at the processor.

    That said, there is a great site that compares the servers and databases you mention, and will likely give you the stats you are looking for. Its www.tpc.org [tpc.org].

  • by QZS4 ( 7063 ) on Tuesday April 18, 2000 @06:25AM (#1126272) Homepage

    Anyone know how much RAM you can put into one of these? They tested the system with 256 MB, which is a spit in the ocean for high-end systems nowadays (well, I might be exaggerating a bit...). I think it might be possible to use more than 4GB physical mem by some page table magic, but the per-process limit might be restricted to 4GB... Wait, maybe not - anyone remember LIM EMS? (*) Although, that is very ugly indeed.

    As I see it, this is what they have to solve, and solve it pretty quick, if they want to continue selling 32-bit processors. Today, there are lots of people running their programs on supercomputers, only because of the large memory, not because they need the processing power. It would be possible to save millions if the high-end PC class desktop systems could be fitted with, say, 24GB mem.

    But the built-in EEPROM was cool, I wonder if you can trick it into using that for booting, a la Sun's OpenBoot prom...? One can always dream.

    (*) To those who are too young to remember, EMS was an ugly hack by Lotus, Intel and Microsoft to be able to use more than 1 MB of memory on the 8086 / 80286.

  • by garver ( 30881 ) on Tuesday April 18, 2000 @05:16AM (#1126273)

    As always, when looking at cache, you compare bang for buck. Adding cache costs money, lots of money sometimes. Some processor architectures get more mileage out of added cache than others.

    For example, the G4 seems to love cache and screams faster and faster as you add it. Apple/Motorola have found the 1MB cache level to be their sweetspot, most bang for buck. On the other hand, the PIII is not as cache loving. Giving it another 0.75MB doesn't do it all that much good, so why waste the money? Their sweetspot seems to be 0.25MB.

    To compare cache amounts, without taking the processor itself into account, is almost as dumb as comparing clock rates (Mhz).

  • by Epi-man ( 59145 ) on Tuesday April 18, 2000 @09:49AM (#1126274) Journal
    What are you talking about???

    .18u my rosy red arse! For those of you not in the know, the .18u measure is the smallest feature measure, or the Lambda of the chip. Every other dimension on the chip is a multiple of that number.

    But it doesn't have to be an integer multiple of that number, and often isn't, especially when dealing with the width of the gates for the buffer transistors, which are usually in round microns. Also, Lambda design rules often use 2*Lambda as the minimum feature size.

    It is the distance across the gate of a transistor from source to drain. Now, when they bake the chips, that distance shortens by a few mirons.

    And this is where I really started to worry about you. You are saying that the distance (0.18um) shortens by a few microns?!!!! That means we have found the transistor that occupies negative space! Patent that one!

    What you are really saying here is that the transistors get smaller. So what the heck are you talking about here.....

    Unfortunately, the marketing dept. got wind of this and took off with it. Now, they measure the shortest distance from source to drain right near the gate, because the further from the gate the measurement is taken, the wider the gap is. (Sort of a curve...) So in reality, those .18u chips are actually .20u or .21u.

    What are you talking about? How can you measure the gap between the source and the drain "further from the gate?" The gate lies between the source and the drain, it is what controls the flow of carriers through the channel (or at least, is supposed to control it, but we won't get into the short channel effects right now). We are talking about self-aligned processes where the gate defines the source and drain seperation by acting as an implant mask! Where do the source and drain get farther apart, making transistors that have gate lengths bigger than the given 0.18um spec? In reality, if we cleaved a PIII from the line today, I would be willing to bet you would find most transistors have a gate length on the order of 0.15um....

    It doesn't sound like much, but when you're talking about millions and millions of transistors, that's a lot of space. (But probably still no more than the head of a
    pin.)


    Hmmm, maybe you are talking about the polysilicon lines (local interconnects) between the transistors getting bigger as you go away from the transistor? Sadly, LI isn't my true bag, so I can't comment with authority, but I suspect you again are out in left field.

    I think I see what you tried to talk about. You mentioned the "shortening" of a distance with "baking" of the chip. You must be talking about the diffusion of dopants during the annealing processes. That would be the source and drain doping that subdiffuses under the gate oxide, shortening the seperation between the source and drain, potentially leading to a punched through device, and affecting the Vt of the device. How you are going to relate this to minimum drawn gate length I have no idea. I find it odd that you didn't mention the article talking about the width vs. length of the transistor being the critical dimension, but given how odd I found the rest of your comments, I shouldn't be surprised.
  • by doonesbury ( 69634 ) on Tuesday April 18, 2000 @05:55AM (#1126275) Homepage

    From the article:

    "A scratch EEPROM which ships empty, and gives system manufacturers or processor resellers the option to include whatever data they wish. It can also be used to track various information about the system or the processor, including system specifications, inventory and service tracking, installation defaults, environment monitoring, and usage data. It can be write protected by the system, as well."

    Has anyone considered that this could be used to store virii? It'd be a pain - but if manufacturers can use it to keep info about usage data, no doubt it's re-writable.

    Just a tiny thought.

  • by bluGill ( 862 ) on Tuesday April 18, 2000 @06:31AM (#1126276)

    Rumor has it that dual athlon boards will be coming out within the enxt few months.

    There is a hitch though: Linux will no support them! Thats right, Linux will not support SMP Athlons today. FreeBSD will not either. The good news is at least NT will not have support.

    I think you see the problem: nobody will support them, so they won't sell, so nobody will add support, so . . .

    I hope that someone overcomes that problems (and I think the board manufactures are working on NT support before they release something) When it works the Athlons will do much better SMP then any Intel offering. Seems that AMD, Cyrix (Do they make processors anymore) and the like got mad at Intel's SMP scheme and created a better one. The K-5 and cryix chips supported it, but nobody made a board to support it. I don't know if Athlon uses the same older spec or a new (alpha compatable?) spec, but I do know the Athlons all support a SMP standard better then Intel's.

    I suspect that a linux implimentation of Athlon SMP will happen when boards are avaiable. AFAIK AMD is not hiding the specs.

  • by KillRaven ( 19894 ) on Tuesday April 18, 2000 @05:16AM (#1126277)
    Why doesn't anybody do any worthwhile server reviews? I'm really not that interested in how well it handles compared to some Celeron in some multimedia benchmark. I want to know how it compares to Sun, SGI, IBM and Compaq Alpha hardware. I want to know how well it can serve up a couple of 100 remote X sessions or how it handles a 500 GB Database that get's hit heavliy 24 hours a day. This is essenitally a server set up, so why do they insist on testing it like they would test some crappy games box. Wow it's faster that a dual PIII 500, big deal, is it faster than a dual Alpha set up?

    So while the test may have been somewhat entertaining it is completley useless. The benchmark isn't anything I recognise as an accurate simulation of a server environment and there are no real life tests. Show me a test comparing this to a Sun box running Oracle and 500GB of data and I might be interested.

  • by be-fan ( 61476 ) on Tuesday April 18, 2000 @07:34AM (#1126278)
    Is it just me, or does Intel's new "use one die" for everything seem to have gotten them into a little trouble? I read the article, look for how exactly the new Xeon is different from a Coppermine PIII. Isn't the whole point of a Xeon the large full speed L2 cache? With the PIII having a 256K full speed cache, isn't a 256K Xeon, well, redundant? I do hope there are 2 meg integrated Xeons coming soon, because otherwise, you pay more for almost exactly the same processor.
  • Ok, PIII Xeons could be nice in a dual-processor setup, but why does Intel continue to insist on using that high latency RDRAM?

    Tom's Hardware Guide [tomshardware.com] just had an article [tomshardware.com] which convinced me to stick with SDRAM for quite some time to come. Maybe for highly memory intensive long processes RDRAM is worth it, but how many of us will fin that worthwhile?

  • First, I agree with post #36, 256KB is usually enough for an x86 CPU. Here is an explanation:

    The reason Motorolla (and most RISC cpu's) need more cache is because code size is larger. The whole reason for CISC (x86 type) instructions is that they take up less memory. They do more work per instruction (this is another reason why MHz is a useless comparison between RISC and CISC). Since the code size is smaller, it stands to reason that they don't get as much benefit from a larger cache.

    So, why doesn't Intel build larger caches and get just that much extra performance? Two reasons:
    - it costs more
    - large caches have more latency
    It's the never ending battle in cache size to balance low latency with high hit rates. The more you increase one, the more you decrease the other. With full speed caches on x86, 256KB (with some set associativity) seems to be the sweetspot.

    So, why use more cache on Xeon's? Applications that have many processes running and access lots of different areas of memory in a short time benefit the most from a high cache hit rate (bigger caches). This is exactly the type of application that servers run. In this case, the higher latency (slower) cache is worth having a higher hit rate.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...