Slashdot is powered by your submissions, so send in your scoop


Forgot your password?

Intel Hyperthreading In Reality 285

A reader writes: "Looks like GamePC has got the first look at Intel's new Xeon processor, which has the new super-fantastico Hyperthreading technology, which tricks your OS into thinking one CPU is two CPUs, two CPUs is four. Looks neat in theory, benchies included."
This discussion has been archived. No new comments can be posted.

Intel Hyperthreading In Reality

Comments Filter:
  • WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???

    • Allows more processors in less space. I don't know about performance figures...
    • Re:One Quesion.... (Score:3, Informative)

      by Glonk ( 103787 )
      WHY? I mean, come on... If you want two processors, shouldnt you have 2 processors in the systems???

      Maybe because SMT makes the die 5% bigger, while 2 processors is upwards of 100% bigger? This is where a thing called "cost" comes in.

      SMT essentially allows for the CPUs to be used more efficiently. A lot of the time an ALU will sit idle while the FPUs work, and with SMT both can work at the same time on different threads.
      • This is where a thing called "cost" comes in.

        The cost savings also applies to the motherboard (and the motherboard design) since you don't have two (real) cpus contending for resources. Last I checked (couple years ago), the price between a dual-proc mobo and the single-proc equivalent was approx $100, a significant fraction of their cost.

  • by Pharmboy ( 216950 ) on Wednesday February 20, 2002 @05:37PM (#3040151) Journal
    That would explain why they cost twice as much as they should :-)

    I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.

    • Re:That explains it (Score:5, Informative)

      by Phs2501 ( 559902 ) on Wednesday February 20, 2002 @05:47PM (#3040244)
      Basically, as I understand it, it allows closer to 100% use of your CPU at any time.

      Modern CPU's have many different execution units. Depending on the code running, not all of them may have work scheduled. Future work may depend on previous results; obviously you can't do this in parallel. The idea of "HyperThreading" is to run more than one thread of execution at a time with the multiple execution units - so more work gets done per clock cycle.

      A quick Google search turned up an article here []. At one point I read a really excellent article on single-processor multithreading (discussing a future Alpha processor) but I can't find it anymore. Hopefully AMD will do something like this as well for a future Hammer processor.

      • So basically Intel is saying that you can get up to a 30% performance boost with Hyperthreading enhanced code right? Therefore their design of the P4 is only 70% (probably less) efficiant.

        Same can be said of the Athlon as their CPU is not 100% either. It would be interisting to see how much of a performance boost Athlons get due to this technology. I venture to say that it would be less since the athlons do more work per clock then P4's.
      • Basically, as I understand it, it allows closer to 100% use of your CPU at any time.

        Wow, I hope AMD run with this kind of idea as well.

        Who'll need central heating in their home anymore? ;-)

      • Basically, as I understand it, it allows closer to 100% use of your CPU at any time.

        Sssoooo... Dr. Watson will use 99.5% of my CPU, rather than just 99%?
    • Re:That explains it (Score:1, Informative)

      by Anonymous Coward
      > I am not as concerned with how it "tricks" the OS as much as I am about performance and reliability. Tell me how this actually makes the chip BETTER and I might get excited.

      Read the freaking article.

      [simplified summary]

      The processor can handle thread scheduling better than the os can handle thread scheduling. Claiming to be 2 processors pushes half of the scheduling from the os to the processor. Net performance gain is expected to be around 10% when the number of active threads is at least twice the number of processors.
  • by caferace ( 442 ) on Wednesday February 20, 2002 @05:39PM (#3040169) Homepage
    So I can just buy half a processor, and get full functionality? ;)
    • Hmm, it really depends on how it is being accomplished. since 1=2, and 2=4 then it must be 1+1 and 2+2 so by that standard 3=6, and 1/2=1

      However, if it's using a method of squaring above 1 then 3 would equal 9, and 1/2 would equal 1/4.

  • by Anonymous Coward keyboard so I could in essence have four hands.
  • by syzxys ( 557810 ) on Wednesday February 20, 2002 @05:39PM (#3040179)

    Hyperthreading is a pretty cool idea, especially for those of us who would like to see SMP move more into the mainstream [].

    According to this article [], though (posted on []), the Windows 2000 scheduler doesn't know how to take advantage of hyperthreading, since it doesn't know how to take advantage of virtual processors. (I suppose Windows XP does?) Go figure. Anyway, this looks like it's probably worth checking into. I'm sure Linux will support it!

    Have you crashed Windows XP with a simple printf recently? Try it! []
    • by Blue Lozenge ( 444566 ) on Wednesday February 20, 2002 @06:09PM (#3040414) Homepage
      Here is a quote from the article:
      Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.

      It would seem that you don't need special OS support beyond standard SMP.

    • Disclaimer: heres what I've heard from "credible sources" You may want to verify this with a bunch of benchmarks.

      What actually happens is that W2k thinks it has 2 CPUs, when it really has say 1.3 effective CPUs (hyperthreading isn't a 2x perf speedup by any means!!)

      This sends the scheduler into fits on w2k. Additionally, it means you cant use a copy of w2k licensed for 2 cpus on a 2 cpu box if each cpu features hyperthreading, since it will look like a 4 cpu box.

      Basically, stay away from hyperthreading unless you're using xp, or some other OS that handles it right (do any other oses handle it right ?)

      • Additionally, it means you cant use a copy of w2k licensed for 2 cpus on a 2 cpu box if each cpu features hyperthreading, since it will look like a 4 cpu box.

        According to a recent post on linux-kernel, there's a BIOS-level hack to work around this: the "real" CPUs are always listed before the "virtual" CPUs. So, if you boot a copy of XP licensed for 4 CPUs on a machine with 4 hyperthreaded CPUs, it will use all four real CPUs, and ignore the hyperthreaded element. (The downside is that processor IDs aren't as obvious under Linux; you'd expect CPU#1 to be the "second half" of CPU#0, but it isn't...)

  • The XEON chip started shipping bacnk in January. The Prestonia server chip is made on the 0.13-micron process. Intel cancelled the .18 micron process last year to focus on the .13 micron process. That is amazing!!!

    Alot of people are giving reviews for the new XEON chip

    Here is a link [] To another review of the XEON chip.
  • Ouch (Score:5, Funny)

    by Andrewkov ( 140579 ) on Wednesday February 20, 2002 @05:40PM (#3040189)
    Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!
    • Yeah, we'll have to "upgrade" to Win2K Advanced Server to get enough processor licenses for a dual processor hyperthreading box. Otherwise we'll be violating the EULA, and they might hit us or something. (*cringes*). Great, I can't wait.

      Have you crashed Windows XP with a simple printf recently? Try it! []
    • Re:Ouch (Score:2, Redundant)

      RTFA. From the article:
      Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.
    • Hyper means overexcited - how about Underexcitethreading - Makes your PC with two CPU's think it only has one! :P Oh wait... That's like windows 98...
    • It's only a virtual processor. Send them a virtual license fee!
    • Re:Ouch (Score:3, Interesting)

      by KidSock ( 150684 )
      Wow, kinda sucks if your OS has a per CPU license, like NT and Win2K server!

      You want to know something even better? The way CPU ids are managed is by bits in an integer. Every other bit represents the "virtual" CPUs. Now when the Windows kernel is selecting CPUs for schedualing purposes it enumerates them in order. This means that when a process is schedualed to run on the next available CPU theres a very good chance it will get a virtual CPU even though a real CPU is completely free. So if you have an 4 CPU machine (4 real, 4 virtual means 8 Hyperthreaded total) and you have 4 processes that can run only 2 real CPUs will be used.

      Ok, ok, ok stop laughing. Here's the kicker. MS fixed this. But did they provide the fix to there customers? No. You have to get the Data Center version to enumerate your CPUs properly!
      • Let me know when I, a typical home user, can afford one of these chips in place of a "regular" one, and then we'll look at what OSs properly support the CPUs.
        In the mean time, it's the corporations that will be buying computers based on this chip, and have the money to purchase the OS to match.
  • Argh, 12 pages! (Score:5, Informative)

    by mESSDan ( 302670 ) on Wednesday February 20, 2002 @05:42PM (#3040200) Homepage
    Make sure you use the Printer Friendly view, that way you don't get 12 pages of slashdotted hell! Look here [].
  • by esses ( 223521 ) on Wednesday February 20, 2002 @05:42PM (#3040207)
    Basically what they're doing is simply taking unused processor resources and allocating them to another thread. You can now have multiple _threads_ of excecution simultaneously... truely simultaneously.

    Thread X is using register's B and C
    Thread Y can able to use registers A and D.

    These threads can be executed together without a context switch... and the processor will hunt out these relationships in hardware. That's what "the big deal" is.

    Until now, when a processor "multitasks", it's simply switching from one thread of execution to the next... it allocates separatetime to two different threads....Now it can allocate the exact same timeslice to multiple threads as long as there isn't a resource dependancy.

    If your program can be architechted to take advantage of this (or your OS can schedule tasks like this), you'll get a huge benifit (read: if it works on SMP systems, it'll get some benifit on this as well).

    • by bjk4 ( 885 ) on Wednesday February 20, 2002 @05:49PM (#3040274) Homepage
      In hyperthreading, the logical processors do not share registers, just function units. Thus, if one logical processor needs to multiply while the other needs to add, they may share the CPU resources simultaneously.

      This was developed in response to the observation that individual function units remained idle for multiple cycles while the current process was busy doing one kind of operation.

      • Yup correct (Score:2, Insightful)

        by esses ( 223521 )
        I was trying to simplify things... I probably went a bit too far.

        The regsiter level contentions are alleivated with Out of order execution (more ore less).

        A good example of where hyperthreading helps is the front side bus. Procesors tend to spend over 80% of their time executing out of cache. Thus the front side bus is sitting idle (or performing simple snoops).

        If one thread is going to be memory intensive (video streaming for example... or texture manipulation), or even I/O intensive and thus results in a lot of transactions along the FSB... it can occur at the same time as a second thread that's FPU intensive

        (asuming the I/O intensive one isn't FPU intensive as well).

    • The Intel Xeon actually has *another* set of registers to cope with the second thread.

      Unfortunately, the big slowdown in computers is accessing memory and peripherals on the various buses. Looking at the details of the Xeon, it still competes (and queues) for access to memory.

      It's also worth considering that although programs tend to have a few threads to look after things like printing while you carry on writing your document, you tend to by using one or maybe two threads heavily at once and the rest are just mostly idle, waiting on hardware and interrupts.

      Intel themselves are claiming 10% speed improvement, even when compiled to take account of SMP, or 30% for specially optimised code (yeah as if that's going to be popular). Don't get fooled into thinking your PC is going 2x faster.
      • There are many issues which the article did not address at all. For example, I would of loved to known how it effected system latency. For example, if over all performance is (-1-2%), and process latency has been improved by say 5%-10%, for workstation users, this may be a worth-while trade off.

        Also, the article seems to push very hard for raw CPU performance. Allow me to clarify. While Intel does seem to indicate that performance boosts can be achieved, I didn't really read it to mean that total aggregate CPU performance would be gained or if it is, certainly not by much. Let me put it like this. It smells like this technology is geared to help out systems which normally run 80%-90% of their total CPU whereby HyperThreading would allow for effient use of the difference while requiring only common SMP application support.

        Also, I didn't read that HyperThreading was geared to be directly taken advantage of by Linux or Win platforms. I suspect that there are significant OS opimizations that can be made for more intelligent scheduling and improved processor affinity. Here, I can see that processor affinity may make significant differences in overall performance. While Win's CPU affinity is only slightly better than that of Linux's current scheduler, I'm hoping that significant affinity improvments will go a long way toward addressing possible shortfalls with this technology. As such, it certainly would of been interesting to see how well Linux did with the new O(1)-scheduler in development as it has many optimizations which specifically address better CPU affinity. Plus, if the scheduler can make the distinction between virtual CPUs and it's associated owner, I can see that it may make sense to allow for processor bias between physical and virtual CPU's within a scheduler. After all, if a process is to migrate, it would seemingly (best guess here) make sense to allow it to migrate to a virtual self first before it migrates to another CPU entirely. If a process is currently executing on a physical CPU, does it make sense to allow it to migrate to a virtual CPU on a physically different CPU? I'm guessing that would make for a significant performance hit. How would it perform is process migration were only allowed to occur to it's own virtual CPU? I'd certainly like to know.

        By allowing the scheduler to make intelligent migration and accordingly biased decisions, I'm guessing that any OS may be able to make significant performance in-roads while using the HyperThreading technology. As such, I'm guessing that more significant performance gains can be achieved by having the OS HyperThreading aware rather than attempting to heavily optimize at the application level. With proper OS support, I'm guessing that little more than simply SMP application support will place this technology in a completely different light.
    • Registers in modern processors get renamed. Intel gets away with having such few logical registers in their ISA (instruction-set architecture) because they have dozens of physical registers.

      All hyperthreading will do is just maintain a different program counter and re-order buffer for each thread. There are probably other minor details as well, but don't get caught up in registers from a programmer's point of view. There is magic under the hood that the programmer will never ever be aware of. At some point in your program, their may be 8 or so "EAX" registers. Later on, this same register may be renamed to a "ESP" register.

  • Finally! (Score:3, Funny)

    by rw2 ( 17419 ) on Wednesday February 20, 2002 @05:43PM (#3040211) Homepage
    I've been waiting for literally *years* for a CPU that will trick my operating system! Nirvana, I kiss you!
  • With AMD's past history with overheating under heavy use (read overclocking), wouldn't this hyperthreading just compund the issue by tricking the OS into overworking the CPUs?
    • No. Heavy use does not mean overclocking, not at all. If you run your CPU pegged, the chip wouldn't overheat... and isn't the topic of the post about Intel doing this?
    • Re:Overheating (Score:1, Interesting)

      by Anonymous Coward
      Dear Clueless,

      The article is about Intel not AMD.

      Failure due to overheating is generally a user incompetence problem not a design problem.

      High-end business servers aren't typically overclocked.

      No, nobody really buys these processors for anything but high-end business servers.

      Hyperthreading doesn't magically make the CPU work at 120% capacity. Even without using hyperthreading, different tasks could make the processor work just as hard.

      Hope this clears up some issues for you.
    • Re:Overheating (Score:3, Interesting)

      by castlan ( 255560 )
      Overheating looks like a valid concern in this case. While overclocking will push the limits of heat dissipation, that is not the same as heavy use. An overclocked processor will still generate significant heat even when in an idle loop.

      The issue that concerns me is that most consumer CPUs aren't designed with true heavy use in mind, and the specs usually consider that most of the time, the standard processor is not pegged. This can be an issue if full time compute processes don't give the processor time to idle, as in Seti or Distributed.Net usage. That is why these projects specifically warn against overclocking - the combination adds up.

      Now even with a full time load, like with the client, the entire processor die isn't generating heat - some of the CPU logic remains idle. This still allows for a buffer for heat dissipation, as slight as it might be. Now with this hyperthreading technique, most of the die can be actively generating heat simultaneously, pushing the heat generation potential higher than the specs likely considered.

      Considering that the largest problem that all Intel processors had since the Pentium 60 involves inability to deal with sufficient heat dissipation, this concerns me deeply. I fear the day soon approached where the Intel processor code names are based on the Black Body Effect: The low end "black" "dull red" and "infra-red" models are outmatched by the "Hot-white" and "blue blaze" series, but much of the extraordinary cost is attributed to maintaining active-cooling systems that are spontaneous-combustion-retardant.

      And the Melting point of silicon substrate with varioius doping agents will soon become common knowledge.

  • a cluster of a cluster of these ...
  • Sure, it does sound good having the ability to pose as 2 cpu's, but you won't get the performance that you would from a real dual cpu setup.

    And, because of this AMD is at advantage. Athlon is much much smaller at an equal fabrication process, so even if hyperthreading took off, AMD would be able to combine 2 cpu cores in the one chip and still be able to compete easily in terms of die size and attain a higher level of performance, because 2 real cpu's will beat 1 cpu posing as two any day.
    • Let me guess...if AMD had come up with this, you'd be telling us how it was bad for Intel too. What nonsense! AthlonXP's are NOT 'much smaller' at an equal fabircation process (I am assuming you meant the process size).

      Don't get religious about your CPU''s not only bad form but it's childish too :)
  • by howlingfrog ( 211151 ) <{moc.oohay} {ta} {2002noynekmja}> on Wednesday February 20, 2002 @05:49PM (#3040266) Homepage Journal

    A number of people have posted asking what the point was of making a single processor act like two processors. It's actually explained in the article linked to above.

    Apparantly, he big deal is that a single processor can only handle one thread at a time--multitasking works by breaking programs down into threads, and working on one thread for a little while, then another, then another, then back to the first. But at any given time, only one thread is being actively executed. Hyperthreading changes this--a single processor can work on two threads truly simultaneously. This makes multitasking a hell of a lot more efficient.

    • by greymond ( 539980 ) on Wednesday February 20, 2002 @06:03PM (#3040380) Homepage Journal
      but then theres this: "While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled."
  • The impression I got from the story is that Intel put this in now so that they can figure out what bottlenecks they face in turning hyperthreading into an advantage. Also, it gives compiler writer, OS writers, and application writers some exposure to the technology. I would not be surprised if a few years down the road this is a big win for some environments (ie, not office2.005k)
  • the site is partly /.'d already but the printer friendly (non graphic) version seems to actually still load. ew=ppso&mscssid=&tp=
  • by Animats ( 122034 ) on Wednesday February 20, 2002 @05:57PM (#3040335) Homepage
    All this "hyperthreading" does is share some ALU resources between multiple threads. The big win is if one thread does lots of FPU work and the other doesn't. If both "hyperthreads" are hitting the CPU's computational resources hard, it probably won't help much.

    And it may hurt. A downside of "hyperthreading" is that the threads contend for cache space, so if the threads are executing very different code, the cache miss rate will rise. Of course, this happens in ordinary threading on each context switch, but with "hyperthreading", there's a context switch of sorts on every instruction cycle. If this effect shows up, it will show in L1 cache miss rates.

    This isn't a totally new idea, either. The first step in this direction was the peripheral processor for the CDC 6600, in the 1960s, which appeared as ten peripheral processors to the programmer. Internally, it was ten sets of registers and one ALU, doing one instruction for each machine state in turn. Basic/4, a forgotten minicomputer manufacturer, tried a similar idea in the 1970s.

    On the other hand, this apparently isn't that tough a feature to add to an already-superscalar CPU, so why not?

    • Have you *ever* seen or heard of a new idea?

      Humans tend to base their thoughts off of what they learned. So, new thoughts are always based off other thoughts of either yourself or others.

      New is pretty tough. Generally means you kept your thought path secret long enough to go through enough revolutions that it no longer resembles anyone elses thoughts. You know what that means? Complete lack of progress. Group thoughts tend to move a little quicker than an individuals.
    • Notice that the Linux kernel build on two threads went slower with "hyperthreading" on than without it. And compiling is as eclectic a task as possible. I can imagine that highly optimized loops in graphics programs already max out some chip resource (like the float alus) so that multithreading them in this scheme does no good, but when compiling fails to parallelize, you know that intel must have screwed the implementation up, big time.

      Multiple processors sharing the same cache on a single chip ought to be a big win, whether they share alus or not. In some cases a set up like this should signicantly out-perform regular multi-processors (when both processors are dirtying each other's caches). Intel must have screwed something up.

      The benchmarks show that the current implementation of "hyperthreading" is basically useless. The idea could work very well though.

      Rocky J. Squirrel
      • Compiling is branch-heavy, with no inner loop and little numeric computation. Heavy load on the cache, light load on the ALU. Worst case for this sort of resource-sharing.

        The best case would be a tight numeric loop that needed the FPU resources about 50% of the time. Then, two "hyperthreads" could load up the FPU effectively. So you could code inner loops to exploit this thing. Maybe.

  • More Intel marketing (Score:3, Informative)

    by hobit ( 253905 ) on Wednesday February 20, 2002 @06:01PM (#3040364)
    This is just SMT (simultaneous multithreading)

    Some other complaints about this "invented at Intel" terminalogy can be found at The Register [].

    Also Toronto has a nice slide show (pdf) [] on the topic.

    For the record I contributed a little tiny bit to this stuff when I was at Intel (I found what I think was the first multi-processor bug for SMT.)

  • Looks like... (Score:2, Redundant)

    by Tony.Tang ( 164961 )
    Looks like GamePC's website isn't running one of these babies yet.

    Slashdotted already. :(
  • SMT (Score:5, Informative)

    by mrm677 ( 456727 ) on Wednesday February 20, 2002 @06:11PM (#3040422)
    Simultaneous Multithreading (SMT) is not a new idea, although no one to my knowledge has implemented it yet. Intel just calls it "Hyperthreading" is essentially SMT.

    And yes, this is a very good idea. A modern superscaler out-of-order processor, like the Athlon and Pentium Pro (and later), can issue and retire multiple instructions per clock cycle. However, it can *only* do this if there is enough instruction-level parallelism (ILP). Turns out, there is not enough ILP in current programs to take full advantage of the chips processing capabilities. Issue slots and function units go unused due to dependencies in the program and cache misses that stall the processing. A typical processor can only look at about 32 instructions at a time. This is not a large enough window to execute future instructions out-of-order when such a stall occurs.

    However, 2 threads of execution will likely fill all of the issue slots. They are also independent threads of execution, so dependencies don't exist between them. This means that when the pipeline stalls due to a cache miss, the other thread can keep on retiring instructions.

    To all those saying that this is dumb, I suggest you study some modern architecture (I'm not talking about your undergrad architecture course either). A paper I read recently studied the affects of SMT on a simulated Alpha processor. The results were astounding with very little changes to the processor core. I heard that the next Alpha was slated to include SMT before Intel killed it.
    • Re:SMT (Score:2, Informative)

      by Daeslin ( 95666 )
      Isn't that what IBM's Power4 chip does? 4 cores on one silicon with certain shared resources....
      • Isn't that what IBM's Power4 chip does? 4 cores on one silicon with certain shared resources....

        Power4 is different, it is a multi-core CPU -- this means there are actually multiple (in Power4's case, two I believe) CPU cores on each die.

        SMT just duplicates certain parts (say, the registers) and they share the resources of the core.

    • I agree. The Alpha guys have been working on this for quite a while. If you want to learn a little bit more about it here is my Master's Thesis [] on the topic. (Actually on scheduling in an SMT system, and also looking at four threads, but the intro should be enlightening). Also the biblography should provide you with everything you'd want to read about it (atleast 2 years ago).
    • Re:SMT (Score:5, Informative)

      by Slowping ( 63788 ) on Wednesday February 20, 2002 @09:05PM (#3041289) Homepage Journal
      I got my undergrad architecture class at the University of Washington CSE [] department, and was fortunate enough to have a few lectures on SMT in my architecture class [].

      Professor Hank Levy [] has a whole bunch [] of interesting SMT papers; covering the architecture, performance analysis, compiler optimizations, etc.

      Here [] is the presentation Prof Levy used during his guest lecture about SMT when I took the class.

  • Last friday I got my first taste of the Xeon processor. I work for a company that makes heavily optimizing OpenMP compilers, and we tend to get some of the latest hardware in short order. Last friday, I set up a machine with:

    Dual Xeon 2.0Ghz CPUs (3997 bogomips on RH7.2)
    1Ghz ram
    36Gb disk

    This machine is extremely fast. A test suite that runs in 4 hours on a dual PIII 800MHz (512MbRam) runs in about 45 minutes on this machine.
  • Anyone else think it odd that GamePC is reviewing this? Do ANY gamers run Xeons?

    • Anyone else think it odd that GamePC is reviewing this? Do ANY gamers run Xeons?

      Yes and it shows as they had little technical information and failed to provide a worthy review on how programs and OS's may better use this technology. Basically it said if you're already using 100% of your CPU you're not going to get another CPU out of this. Last I heard, a CPU can only provide 100%. Surprise. I personally was not very surprised to read that, however, they could of tested many other situations to see how well it performed, that is, cases where less than 100% was being used as is often the case on workstations and servers. Let's face it, if your production server is constantly at 100%, you need a new work load or a new server (unless it's purely a computational server whereby, I believe the technology assertions seems to imply that this is not it's targeted mode of operation). Pretty much the same goes for workstations too. So in that regard, the test was pretty much worthless as it was completely unreflective of how the technology might actually get used in the real world.

      Was anyone really surprised that 1 != 2? A side from the author, I know I certainly was not. And that, I think is why they certainly were not qualified to perform a meaningful benchmark with this technology.

  • by segfaultdot ( 462810 ) on Wednesday February 20, 2002 @06:16PM (#3040446)

    Prestonia Xeon 2.0 GHz vs. Athlon MP 1900+


    While Intel and AMD have seemingly taken a breather from their constant one-upmanship in the consumer processor market, things are still churning along for the workstation and server markets. While the consumer level chips from both companies (Pentium 4 and Athlon XP) bring in large portions of cash, the workstation and server processors are where the real money is made. These processors go for a much higher price premium on the market and are commonly used in more expensive multiprocessor setups.

    The customers who buy these chips tend to buy large quantities and like to use them for multiple years without any issues. Therefore, stability and reliability are the most important factors in buying a chip here with raw performance coming in second. Sure, having an incredibly fast processor is nice, but if you're constantly having to reboot the systems due to processor or motherboard stability problems, the system becomes more of a burden than help. Thus, there is a constant struggle for IT managers to either go for the fastest workstation chip on the market, or go with the chip that's known for excellent stability. Both Intel and AMD are striving to become the processor manufacturer that gives workstation users both the best performance and best stability on the market.

    Intel has the Xeon family, which has had a foothold in the low-end server / high-end workstation market for multiple years now, stemming back to the original Pentium II Xeon. The Xeon now clocks up to 2.2 GHz and comes equipped with features like 512k on-die cache, a 400 MHz front side bus, and some nifty on and off-die thermal monitoring features. Their new "Prestonia" Xeon family was just recently released to market, which is what we're looking at today.

    AMD, on the other hand, has the Athlon MP. Renowned for its incredible price/performance ratio, the Athlon MP has had a tough time making a name for itself as a big time server chip, although has done fantastically well in the workstation market. The combination of a fairly low cost processor along with similarly priced motherboard and memory have made the Athlon MP platform quite the hit. The Athlon MP was recently bumped in speed up to 1.6 GHz, which uses the AMD PR rating of 1900+.

    Today at GamePC, we're looking at two of the fastest consumer-level multiprocessing chips on the planet, Intel's "Prestonia" Xeon 2.0 GHz right alongside AMD's top of the line Athlon MP 1900+. Let's boogie.

    Intel "Prestonia" Xeon 2.0 GHz
    The Prestonia family of processors is to the Xeon what the Northwood family is to the Pentium 4. The Prestonia Xeon shares all the benefits of the original Pentium 4 Xeon, like a 400 MHz FSB, double-pumped ALU units, and SSE-2 instruction support, but it also has a few added bonus features which make it far and away better than its predecessor.

    Just as Intel recently did with their Pentium 4 family, the Prestonia Xeon is manufactured on Intel's new 0.13 micron manufacturing processes, which allow for a smaller die area, along with lower power consumption and lower heat emissions. Not only does this make the Prestonia Xeon cheaper to produce, but the lower heat amounts come in very handy when dealing with dual and quad CPU configurations in a small form factor like a 1U or 2U rackmount. For example, the original 2.0 GHz Xeon produced a maximum of 77.5W of heat, while the new Prestonia Xeon at 2.0 GHz produces only 58W.

    While reducing the manufacturing process, Intel also managed to stick in an extra 256 kB of L2 cache on to the processor die, giving it a total of 512 kB of full-speed on-die cache. As we've seen before with the Pentium 4 Northwood, adding another 256k of cache on to the Pentium 4's core can add up to 10-15% added application performance. Thus, the Prestonia Xeon gets that same speed increase compared to previous Xeon processors. Rumor has it that Intel will announce Xeon CPU's in the future with extra on-die cache, such as the case was the original Pentium II and III Xeons.

    Both the original Xeon and Prestonia Xeon look roughly the same packaging, thus telling apart the CPU's can be difficult unless you have one right in front of you. Intel has the CPU markings on the bottom of the Xeon CPU's, as opposed to the Pentium 4 CPU's which have the markings right on the CPU's heat spreader. A quick flip of the CPU reveals the CPU's vital information. As you can see by the Xeon's S-SPEC codes, this is a 2.0 GHz Xeon with 512kB of L2 cache, running on a 400 MHz FSB, while running at 1.5V core voltage.

    Even though there's a new core running underneath, Intel decided to keep the original Socket-603 form factor of the original Xeons, allowing you to upgrade to these newer chips without buying a new motherboard. As Xeon motherboards can be extremely expensive, this is a very, very good thing.

    Besides the new manufacturing-level features of the processor, there has been one buzzword that has been gaining all the attention lately. Hyperthreading, the feature that can theoretically turn your 2 physical CPU's into 4 virtual CPU's. Let's investigate.

    What Actually IS Hyperthreading?
    Hyperthreading is actually a technology that's been around for quite a long time in microprocessing, but has never been used in a consumer-level product like the Pentium 4 Xeon. The technology itself is based on Simultaneous Multi-Threading (SMT) and was codenamed "Jackson Technology" by Intel while in development. At the last IDF, they gave this technology a name that fits in better with the Pentium 4 architecture, Hyperthreading.

    Hyperthreading is simply a method of placing a second set of registers on the processor core, allowing the processor to execute two "threads" at once. Every time you run a piece of software, the software is sending threads to the CPU for it to execute and process. Until now, consumer level processors can only handle one thread at any given time. While a processor may go through thousands of threads per second, the CPU can only physically execute one at a time. In a dual CPU system, the computer can process two threads by sending one to each CPU. Hyperthreading takes the concept of executing multiple threads and brings it down to the single CPU level.

    Hyperthreading allows the CPU to manage two threads at once, although this doesn't necessarily mean there are two CPU cores on the same die. Each register set can handle one thread, but each thread has to fight for processor resources like storing data in cache and sending it out through the front side bus. This means a single CPU with hyperthreading capabilities will not perform the same as two physical CPU's in an SMP configuration. While the ability to execute two threads at once was one of the main reasons why SMP was brought to market (symmetrical multi-processing, i.e dual CPU systems), the costs of going to SMP, such as SMP compatible motherboards and processors, in most cases far outweigh the benefits.

    Unfortunately, since the threads have to fight for resources, there can be conflicts. If two threads want to use the same processor resources at the same time, they have to get in a queue to do so. Since most every piece of software on the market is written to only take advantage of a single CPU, suddenly throwing a single processor application on a dual/quad processor system will show literally no advantage in performance. Even as of today, only small percentage (mainly workstation/server applications) are multi-threaded to take advantage of multiple CPU's.

    To get the full advantage of Hyperthreading technology, the software will have to be "optimized" for it. Whether this means re-compiling the software to support Hyperthreading through a new Intel compiler or just adding a few more lines of code, we're not certain. Intel states in their technical documents that software written to take advantage of SMP will get in upwards of 10% performance gain with a Hyperthreading capable CPU. If the software is optimized specifically for Hyperthreading, Intel has seen performance gains up to 30%.

    Nowadays, where SMP is common in workstations and servers (and in some cases, desktops), there is a lot of multi-threaded code out there. The latest major operating systems can handle multiple processors, most professional video / audio editing software can use the CPUs, and even games are just starting to take advantage of a second CPU if available. This is the market that Intel's looking to capitalize on.

    Hyperthreading in Reality
    The buzz around Hyperthreading is that a single Xeon system will be seen as two CPUs, while a dual Xeon system will be seen as a quad CPU system. Of course, people immediately think, "Wow, two CPUs for the price of one!" This is certainly not the case with Hyperthreading, just as dual processors do not give you double the power of a single processor.

    Since Hyperthreading is implemented on the hardware level, the motherboard sees a single hyperthread-compatible CPU as two physical CPUs. Thus, software that is written for multiple CPUs will be tricked into thinking there is a second CPU in the system, and will run the appropriate multithreaded code if available. Since Windows XP and 2000 are coded to take advantage of multiple CPU's, it too sees a hypertheaded CPU as two.

    In our case, since we ran with dual Xeon processors (each with hyperthreading capabilities), the OS and software see this as four physical CPUs, even though there are only two physical CPUs running. As you can see by the device and task managers in Windows XP, the OS sees our system with four physical CPU's. Eeven though Windows 2000 and Windows XP only officially support two CPU's, both operating systems were able to run properly with the Hyperthreaded CPU's. This means you don't have to upgrade to a 4-processor OS like Windows 2000 server to take advantage of this technology.

    While this looks great for showing off to co-workers or friends, you will absolutely NOT get the performance of four CPUs running in your system (I can't stress this enough). As you'll see in our benchmarks later, even if software is written to take advantage of SMP, you rarely ever see performance gains with Hyperthreading enabled. In fact, in many applications, you see a performance drop with Hyperthreading enabled, as there is a great deal of overhead when splitting data up over four CPU's to process. Perhaps this is why Intel is recommending motherboard makers leave Hyperthreading disabled in the BIOS.

    It's quite possible that Intel implemented Hyperthreading to take advantage of the Xeon architecture's longer pipeline, an often criticized design element of the Pentium 4 and Xeon families. With Hyperthreading, they can start a second process after the first one is farther down the pipe. From a theoretical standpoint, the code would have to either be highly optimized for the Prestonia or limit the use of branch prediction, since there are now two sets of independent data in the processor. If you look at Hyperthreading like this, it would appear to be the next generation of the P4's out-of-order speculative execution engine.

    From what I now understand about Hyperthreading, it's my belief that Intel is planning to use Hyperthreading in all of its future Pentium 4 products down the road. The Xeon is simply the first guinea pig to actually have the logic enabled on the die. As Intel already has the Hyperthreading logic in the current Pentium 4 hardware, but not implementing it, you've got a sure sign that Intel will simply flip the switch to activate the logic when Hyperthreading applications are actually available. If Intel convinces developers that Hyperthreading is worth their time to optimize for, this could be an incredible feature 1-2 years down the road. As for now, it's fairly useless, but certainly interesting in the sometimes bland world of computer processing.

    AMD Athlon MP 1.6 GHz (1900+)
    The Athlon MP 1.6 GHz is the latest and greatest from AMD's server/workstation family of CPUs, which have gained an extremely large amount of credibility lately due to their incredible price / performance ratio compared to Intel's Pentium 4 and Pentium 4 Xeon families. While slightly lagging behind AMD's own 1.67 GHz (2000+) in raw clock speed, the Athlon MP 1.6 GHz is quite more expensive than the Athlon XP 1.67 GHz, despite the fact that both can run SMP quite well.

    The Athlon MP is based on the "Palomino" Athlon architecture, which is based on the 0.18 micron manufacturing process. While the Palomino chips create quite a bit less heat than the "Thunderbird" variant of the Athlon, the Palomino's still create quite a lot of heat, which can be difficult for dense rackmount situations. The chip itself is based on the Socket-A form factor, which means it should be compatible with most single processor Athlon boards, as well as all the dual Socket-A boards on the market now. As you'll no doubt notice, the new Athlon XP/MP processors are coming with green packaging, although they still use the same organic packaging as previous Athlon MP/XP CPU's.

    The Palomino Athlon core comes equipped with 128 kB of L1 cache, along with 256 kB of L2 cache. While we've heard rumors that AMD may up the cache amounts on their upcoming 0.13 micron "Thoroughbred" processors, we haven't recieved any indication that this is anything more than a rumor.

    Getting a closer look at the Athlon MP 1900+, you can see the Athlon's famous bridges are not "cut", like Athlon XP chips hitting the market. This means with a simple pencil and a motherboard that supports clock adjustments, you can overclock these processors to much higher clock speeds than intended. Of course, workstation and server users would most likely never do this, as overclocking is inherently risky, but we thought it was worth mentioning.

    As you can see from reading the core, our Athlon MP processors are of a fairly recent "AGNGA" core stepping. The first line of text says "AMP1900", which denotes our chip as an Athlon MP 1900+. AMD runs the exact same processor core on both the Athlon XP and MP processors, albeit the MP models go through an extra round of multiprocessor "validation". Performance wise, these two cores are exactly the same.

    The biggest threat for AMD and the Athlon MP is the fact that the platform has been plagued by a lack of absolute stability. While the Tyan Thunder K7 and Tiger MP boards still wrangle with edge-case stability scenarios, the AMD 760MPX motherboards have been plagued with chipset problems and many board revisions. In fact, the release of the 760MPX has undone much of AMD's work in making the Athlon MP synonymous with stability. We absolutely love the Athlon processors, but the platforms still aren't up to the level we were hoping for by now. Still, as more platforms are getting released, the situation IS getting better.

    Just the facts, ma'am.

    Intel Prestonia Xeon 2.0 GHz

    AMD Athlon MP 1900+

    . Prestonia Xeon 2.0 GHz Athlon MP 1900+
    Clock Speed 2.0 GHz (2000 MHz) 1.6 GHz (1600 MHz)
    L1 Cache 8 kB 128 kB
    L2 Cache 512 kB 256 kB
    L2 Cache Speed Clock Speed (2.0 GHz) Clock Speed (1.6 GHz)
    L2 Cache Associativity 8-Way 16-Way
    Form Factor Socket-603 Socket-A
    Front Side Bus Speed 400 MHz 266 MHz
    Manufacturing Technology 0.13 Micron 0.18 Micron
    MMX Instruction Support Yes Yes
    SSE Instruction Support Yes Yes
    SSE-2 Instruction Support Yes No
    3DNow! Instruction Support Partial Yes

    The Platforms

    Supermicro P4DC6+ i860

    Asus A7M266-D AMD 760MPX

    . Supermicro P4DC6+ Asus A7M266-D
    Chipset Intel 860 AMD 760MPX
    CPU Support Up to 2 x Xeon 2.2 GHz+ CPUs Up to 2 x Athlon MP 1.6 GHz+ CPUs
    Memory Type PC-800 RDRAM PC-2100 DDR SDRAM
    Memory Capacity 2 GB Max (4 RIMMS) 3.5 GB Max (4 DIMMS)
    Memory Type Support Standard / ECC Standard / ECC
    AGP Expansion AGP Pro 50 AGP Pro 50
    PCI Expansion 2 x 64-bit (66 MHz) Slots
    4 x 32-bit (33 MHz) Slots 2 x 64-bit (66 MHz) Slots
    3 x 32-bit (33 MHz) Slots
    Onboard SCSI Adaptec AIC-7899W Ultra160 SCSI N/A
    Onboard Ethernet Intel 82559 10/100 Port N/A
    Onboard Audio AC97 Audio C-Media 6 Channel Audio
    Onboard Video N/A N/A

    Pentium 4 Xeon "Prestonia" Testbed System Configuration

    Processors 2 x Intel Pentium 4 Xeon 2.0 GHz "Prestonia" (8k L1, 512k L2)
    Cooling Intel Socket-603 Retail Coolers
    Memory 512MB Samsung PC-800 RDRAM (4 x 128M)
    Motherboard Supermicro P4DC6+ (Intel 860 Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

    Pentium 4 "Northwood" Testbed System Configuration

    Processors Intel Pentium 4 2.0 GHz "Northwood" (8k L1, 512k L2)
    Cooling Intel Socket-478 Retail Cooler
    Memory 512MB Crucial PC-800 RDRAM (4 x 128M)
    Motherboard Asus P4T-E (Intel 850 Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, Intel 3.2 Chipset Drivers

    AMD Athlon MP Testbed System Configuration

    Processors 2 x AMD Athlon MP 1.6 Ghz (1900+) "Palomino" (128k L1, 256k L2)
    Cooling AMD Socket-A Retail Coolers
    Memory 512MB Crucial PC-2100 DDR SDRAM (2 x 256M)
    Motherboard Asus A7M266-D (AMD 760-MPX Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, AMD 1.30 Driver Pack

    AMD Athlon XP Testbed System Configuration

    Processors AMD Athlon XP 1.67 Ghz (2000+) "Palomino" (128k L1, 256k L2)
    Cooling AMD Socket-A Retail Cooler
    Memory 512MB Samsung PC-2100 DDR SDRAM (2 x 256M)
    Motherboard Asus A7V266-E (VIA KT-266A Chipset)
    Hard Drive Seagate Barracuda IV 60GB, ATA/100, 7200 RPM, 2MB Cache
    Miscellaneous Plextor 8/4/32A IDE CD-ReWriter
    Software Windows XP w/ DirectX 8.1, VIA 4-In-1 4.37 Service Pack

    Lab Notes

    * All tests run with VSync (Vertical Sync) Disabled.
    * Nvidia Detonator XP (23.11) Driver used in all testing.
    * All RDRAM memory run with "Nap" mode disabled.
    * All DDR memory run at CAS 2.5 latency.

    Benchmarking Software

    * Adobe Photoshop 6.01
    * LAME MP3 Encoder 3.91
    * Kinetix 3D Studio MAX
    * Red Hat Linux 7.2
    * SiSoft Sandra 2002
    * Windows Media Encoder 8.0

    SiSoft Sandra 2002 is a synthetic Windows benchmark.
    The benchmarks can stress CPU, Memory, or Processor Instruction abilities.
    Higher Sandra scores mean better overall performance.

    CPU Benchmark - Hyper-Threading Support (SMT) Enabled
    (Higher Scores are Better)

    CPU Benchmark - Hyper-Threading Support (SMT) Disabled
    (Higher Scores are Better)

    Memory Benchmark
    (Higher Scores are Better)

    SiSoft's Sandra, while being a synthetic Windows benchmark, is one of the few pieces of software on the market with some level of Hyperthreading support. This is through Sandra's "SMT" test, which to be honest, gave us extremely sporadic results at first. Once we figured out what exactly was happening with the test, we were able to finally lay down some solid numbers.

    First off, it's quite easy to see that the dual Athlon MP setup simply rules the roost when it comes to raw CPU performance. Even with the Athlon MP chips at 1.6 GHz, it's easily able to outpace the dual Xeon 2.0 GHz processors, with or without Hyperthreading enabled. Even the highest performing Xeon setup still trails the dual Athlon MP 1900+ by roughly 30%.

    When Hyperthreading was enabled, we can certainly see some performance gains being had by the Xeon setups. One CPU with Hyperthreading gained 18% in this benchmark, while two CPU's with Hyperthreading gained 23%. Of course, this is simply a synthetic test, and to achieve any real world performance gains like this, the software would have to be specifically optimized for Hyperthreading.

    Upon looking at the results, we're not positive on what effect the SMT test has on our scores. As you can see by the first graph, even with Hyperthreading (hardware)disabled on the dual 2.0 GHz Xeons, it still managed to get a higher score on the Hyperthreading (software) test, compared with Hyperthreading (software) being disabled, which nearly has a margin of 2000.

    In terms of memory performance, Xeon systems still maintain quite a large margin over the current Athlon MP systems. Thanks to the Xeon / i860 dual channel RDRAM memory interface, you've got quite a bit more available bandwidth compared to the Athlon MP / 760MPX single channel DDR interface.

    Adobe's Photoshop 6.0 is the world's most popular image creation/editing software.
    We run a series of filters on an image, while measuring perform them.
    The times for each filters are added up. Lower times mean faster performance.

    Adobe Photoshop 6.01 Filter Benchmark
    (Lower times are Better)

    Adobe's Photoshop thrives on fast FPU units along with lots of memory bandwidth and capacity. Even though Photoshop is multi-threaded, the software only really takes advantage of multiple processors on a few select filters. Thus, running a second processor doesn't necessarily help Photoshop that much, at least in this case.

    In our test, we see the simple single Athlon XP 2000+ processor beating out both the dual Athlon XP 1900+ and dual Xeon systems. While the other platforms were merely seconds away, it's clear that the Athlon-based systems take the cake for best overall Photoshop performance. We see the addition of a second Athlon MP processor took nearly 8 seconds off the benchmark time. Not bad, but we were hoping for more.

    Hyperthreading shows itself here to become more of a nuisance than actually helping performance. With Hyperthreading enabled, the dual Xeon 2.0 GHz system actually slows down by 5 seconds, while a single Xeon 2.0 GHz with Hyperthreading speeds up by 2 seconds. As you'll likely guess, Photoshop is not optimized for Hyperthreading, so any performance gains seem to be purely coincidental.

    Keep in mind, we ran this test with the Adobe 6.01 patch installed, along with Adobe's specially released SSE-2 filter package, and the Xeons still couldn't fully stand up to AMD's new Athlon processors.

    3D Studio is one of the most popular 3D editing suites on the market today.
    We render a 50-frame scene with over 40,000 faces and 20,000 vertices.
    Lower render times mean faster processing performance.

    3D Studio MAX "Tank" Render Test
    (Lower Times are Better)

    3D Studio MAX, and any kind of 3D rendering software, relies almost 100% on the CPU for final scene rendering. Thus, multiprocessor systems are almost required for any kind of professional level 3D modeling software. 3DS Max is indeed able to fully take advantage of multiple processors.

    In our test render, we again see AMD take the take, as the dual Athlon MP 1900+ system rendered our scene the quickest. While the Dual Xeon 2.0 GHz system was just about one minute behind, the Athlon systems simply rock for these kind of applications. Even our single Athlon XP 2000+ system managed to render a few seconds faster then Intel's dual Xeon 2.0 GHz box.

    As for Hyperthreading, again we see mixed results. A single processor with Hyperthreading actually helps out, cutting 15 seconds off our rendering time. Two processors with Hyperthreading hurt a lot, as it added an extra 1:56 to our final render time. Ouch.

    Windows Media Encoder is a free Windows video encoding suite.
    We take a 50MB MPEG file, and encode it to Windows Media 8 (.wmv) format.
    We test at 320x240 Resolution using the WM8 for Cable/DSL encoding method.

    50MB MPEG Video to Windows Media Video Encode
    (Lower times are Better)

    While the Xeon was crushed by the Athlon MP in the previous two tests, the table turns around for video encoding. Encoding our MPEG movie was incredibly fast with the Dual Xeons, the fastest score we've seen for this test to date. Windows Media Player 8 is extremely efficient with multiple processors, giving a 30-40% boost in encoding times for both the Xeon and Athlon MP platforms.

    Even as the Xeon is the clear winner in these tests, Hyperthreading again disappoints. A single Xeon with Hyperthreading tacks on another 20 seconds to our encoding time, while Dual Xeons adds on another 29 seconds. Disappointing, to say the least.

    MP3 Encoding is extremely CPU intensive, and tests the CPU's raw FPU performance.
    We use LAME 3.89, which has optimizations for MMX, 3DNow, and SSE
    A 200MB .wav file is encoded to a 160 kbps MP3, we record the time to encode.

    200MB Wav to MP3 File Encode
    (Lower Times are Better)

    MP3 encoding through LAME is entirely CPU based, but since the program isn't multithreaded, we don't see any performance gains when adding a second processor. Thus, winning this benchmark is simply a case of having the best FPU performance in a single processor situation, which the Athlon clearly does.

    The Pentium 4 / Xeon platforms are 9-10 seconds slower, no matter what motherboard or processor combination is used. Both the Athlon MP and Xeon systems give very respectable encoding performance, but the Athlon MP/XP are clearly the winners here.

    Red Hat is the most popular Linux distribution in the world currently
    We test by recompiling the 2.4.9 kernel using the "make bzImage -j#" command.
    Depending on the # of threads, compiling time can be different, especially with SMP.
    Lower compile times mean better processing performance.

    Red Hat 2.4.9 Kernel Compile - 1 Thread
    (Lower times are Better)

    Red Hat 2.4.9 Kernel Compile - 2 Threads
    (Lower times are Better)

    Red Hat 2.4.9 Kernel Compile - 4 Threads
    (Lower times are Better)

    Compiling a Linux kernel is extremely stressful on the CPU, and as we tested with the SMP-compatible 2.4.9 Red Hat kernel, we were able to see some very nice performance gains with a our multiprocessor systems. As the 2.4.9 kernel also has for "Jackson Technology" (aka, SMT / Hyperthreading), we were hoping to see what Hyperthreading was capable of doing in a Linux environment.

    When the kernel is compiled with a single thread, the systems don't show any real performance gains with a second processor installed. Compiling with two or more threads is where you really start to see the performance gains of SMP with Linux.

    With two threads running, compile times are nearly cut in half with two CPU's installed. The Dual 2.0 GHz Xeons manage to compile the kernel quickest at 1:57, while the Athlon MP 1900+ setup is nipping at its heels with a 2:05 compile time. Compiling an entire Linux kernel in under two minutes is simply an incredible showing of CPU power, any way you look at it.

    For curiosity's sake, we decided to run a compile with four simultaneous threads. As dual Hyperthreading-enabled Xeons can physically take four threads at once, we figured it would be a good test. Unfortunately, there were only 1-2 second differences in compile times between 2 and 4 threads. Compiling the kernel with 2, 3, 4, 5 and more threads gave roughly the same compile times.

    The Final Word
    Both the Prestonia Xeon and Athlon MP are incredible processors, and both engineering teams deserve a round of kudos for producing some incredibly fast SMP-capable CPU's. Each CPU has a specific area where you'll see one dominate over the other, although the majority of the tests were fairly close between the two CPU's.

    In my opinion, the Prestonia Xeon is the better CPU of the two for mission critical / server applications. The Intel 860 platform seems to be incredibly stable, considering it's relatively short time on the market. Not one instance comes to mind where we ran into compatibility issues with our Dual Xeon systems, something we can't say for the Athlon MP systems we setup. Unfortunately, you pay the price for the Intel name, as Xeon systems are extremely expensive. The CPU's and motherboards are both extremely expensive, which makes the Xeon hard to recommend for the workstation market.

    The workstation market is much better suited by the Athlon MP processor, as its price / performance ratio is unbeatable. For most workstation applications, the Athlon MP even will be a better performer, despite its lower price tag. We would love to see AMD put a few more server-specific features on their MP processors to justify their heightened price tags over the Athlon XP, but even as they are now, the MP's are a great deal for the amount of processing power you get in that tiny little core.

    As for the Xeon's Hyperthreading technologies, it's hard not to be disappointed with the scores which we got throughout our testing. Hyperthreading sounds like an incredibly useful processor feature in theory, but in practice, It's useless without compatible software on the market. Time will only tell if developers want to take on the Hyperthreading challenge, and the few developers we've talked to have not been that incredibly impressed with the technology thus far. If nothing else, Hyperthreading will certainly be an interesting to watch out for over the next few years.

    This time next year, it's quite possible that we may be dealing with McKinley and Clawhammer has the workstation processors of choice, if Intel and AMD have their way. While it's anyone's guess if 64-bit processing is ready to come down to the consumer level, this article certainly proves that current 32-bit processors have more than enough power to handle today's applications.
  • AS/400 (Score:4, Interesting)

    by crow ( 16139 ) on Wednesday February 20, 2002 @06:19PM (#3040464) Homepage Journal
    I believe that this was done in the IBM AS/400 using a special version of the PowerPC chip. There was a talk on this at the Ottawa Linux Symposium last summer. According to the IBM people, it mostly worked great, but there were a few issues with spin locks--the CPU saw that one thread was busy (in a spin lock), so it never switched to the other one (that was holding the lock). The Intel implementation may be slightly different, but this is something to look at.

    When your hardware isn't exactly what the software was written for, you tend to have weird bugs like that. I would not be surprised if Windows, Linux, FreeBSD, and other OSes need minor patches to work well with this new hyperthreading from Intel.
    • It's available on RS/6000, too. Our department recently got a p660 server. I was crusing the docs when I stumbled onto something about "hardware threading". AIX 4.3.3 and up can utilize this feature, though we haven't tried it yet.
    • Sounds like that spin lock thing would happen on dual cpu's too.

      Just needs 3. 2 to spin, one to be holding the lock which isn't running. This is mainly the reason that after a tick the kernel evaluates whats doing useful stuff, and what isn't and scheduals accordingly. So.. in this case and all others of locks it required the kernel to interrupt.

      The program should have put itself to sleep if it didn't get the lock after a tick, as it can assume it'll be a while. Postgresql had a large debate not that long ago about the best timing for spin / sleep on SMP. Sleeping immediatly isn't the best thing in multi-cpu cases -- which this is pretending to be.
  • As an owner of an SMP system, I can say with confidence that even having two /real/ processors, which is better than one hyperthreading processor, isn't of any great benefit to Windows users anyway (see comments above about HT on Win2K) other than for servers (shudder) and for running several very CPU intensive apps at once, which very few people do.
    In *nix, however, I have improved my buildworld times for thirty percent. *That's* useful.
    • Since this is a highly subjective matter, I won't disagree with your comment, I will simply interject my own, which happens to have a different point of view.

      On my dual processor machine, running windows, I noticed a *significant* increase in performance when I added the second processor (after, that is, I told Windows that my machine was a dual processor system, it doesn't auto-detect that after it's been installed. You'll have to set it manually if you had WinNT, 2K or XP Pro installed before you added the 2nd proc)
      When running multiple programs, like Photoshop, DreamWeaver, ftp software, Director and Flash (at the same time) I can now comfortably allow one program to do whatever it needs to do on one processor, while the other remains available to the OS to assign threads to in the meantime.

      Simply stating that 2 processors is of no great benefit with out some quantifying data is a little weak, IMO

  • The latest BIOS update for my dual PIV Xeon (Dell Precision 530) says that it added support for SMT/JT... I wonder if they had already tested these CPUs on my system. I WANT!!!! Drool...

  • For tasks that can be easily split into two threads, I have a feeling that hyperthreading could be better than two processors. But, since threading seems to be better implemented on Windows, NT boxen might enjoy the benefits more.

    The best example of how to split a task into threads (that I like to use) is rendering a 3D image to screen. If you want to split that task so that two threads (and thus two processors) can work on it, you just make one thread handle 'even' scan lines and one thread handle 'odd' lines. Keeping the caches cohereent between the two CPU's can be difficult - they're both executing the same code, and might also be twiddling around some piece of memory that they share.

    My point is, with this hyperthreading business, that there's only one cache - so no more cache coherency bothers. I might be concerned that the arithemetic units or whatever else that are on the chip might be in contention for use - but they can just add more of 'em in later steppings of the CPU.

    The problem for us Unix-lovin' folk is that Unix-esque OS'es don't often take threading very seriously. OpenBSD, for example, doesn't even have a kernel-threading implementation (correct me if I am wrong!) The 'Unix Way' is to just fork a process and run two process images. That's fifty billion times easier to debug than two threads that step on eachothers' data (see deadlock). But the forking method - even with nifty things like copy-on-write process images and such - doesn't seem to use as little memory, or perform as quickly, or process-switch as fast.

    When I speak to developers who know their stuff (more than I do) they say - on NT, make a whole bunch of threads and make them talk to eachother with semaphores and stuff - on Unix, fork and write to a pipe. Nothing fundamentally wrong with that division, but advances such as this Hyperthreading thing won't work as well on Linux, I don't think.

  • by fitten ( 521191 )
    Another system that used something like this was Tera: However, what they did was have 128 contexts per CPU and it round-robin'd through them all. You could also "daisychain" multiple CPUs together in a system. It was interesting but I don't know what ever happened to the machines they were building.
  • Has anybody cracked one of the new Xeons apart yet? How do we know that Intel didn't really slip two cores onto the same processor card... then, one processor would appear as one, two as four!! They sell for thousands more than they cost to make, anyways, right? Who's going to know?

    Hmmm?!? :)

    • You have a point. Most people don't know it, but modern processors use a hazardous combination of Potassium Fluoride and spent Plutonium to regulate clock speed, which is the real reason that it isn't safe to overclock your computer!

      That density is especially thick with server CPUs, especially the Xeon. That is why, to date, nobody has set off large enough of a reaction to be deadly with overclocking PCs, but that is not the case with Xeons, whose Plutonium content is dangerously close to the Critical Mass. And you thought your Intel CPU ran hot! Everybody who runs Xeon servers knows better than to play with the clock speed.

      In fact, that is why you can't ususally buy Xeons without ECC RAM, the radiation put off by the computation would too readily disrupt the memory state information. What, you bought that nonsense about solar flares or other sources of random radiation causing bit decay? Of, and FYI, don't run or Seti@home on your Xeon if you have fillings or a mercury thermometer in the area, unless you are interested in a direct demonstrations of fusion in action.

      Really, since the Cold war had ended, Microprocessors have been constantly dropping in price. Why was this phenomenon never observerved until the 80? Moore's Law my ass, how about "military surplus in action." It is much too expensive to store all of this spent plutonium in federal compunds. So the FCC had to ensure that computers had sufficient shielding, and now, there you go. Let me reiterate that ever since the "Pentium Pro" (Plutonium Recycling Operation) and with each new generation of CPU, active cooling remains a matter of utmost priority.

      Really, they would have just put it in the drinking water supplies to distribute the threat amongst all of our nations' citizens (like taxes and fluoride) except that in secret military tests they found that the subjects teeth and bones started to glow in the dark, which would have been too obvious to cover up for long. So they stationed the subjects in parts of Japan, Las Vegas, and tropical locations so that the glowing would be concealed by all of the Neon Lights and overpowering sun (causing a sun tan, to help cover the light emissions) and classisifed the research.

      Well, in any case, I highly recommend tht you don't ever "crack" one of the more recent Xeons apart. Rather, you should carefully and delicately disassemble then in lead casings. much of the extra "thousands" in cost are spent in proper protective casings, which was the real reason for Intel to do away with the Socket interfaces for Slot 1 in the first place. New high performance ceranics and Lead-magnesium alloys have allowed the protective casings to shrink again, but you still need to be careful. And dont EVER let two Intel cores come in contact with each other, or not even the Liquid Hydrogen active cooling system will save you...

      Dammit! I should have posted anonymously! Now they'll get me! It's a good thing I'm posting from OpenBSD virtualized inside of Tinfoil Hat Linux! I can use the HyperThreading Hyperspace technology to encrypt my essence and escape! Fight the Future.

      Fluoride is good for your teeth and bones. Fluorosis is nothing but a commie... er, a Terrorist plot. If you don't ingest fuoride and develop fluorosis then the terrorists will have won! Make sure you brush your teeth with an ADA approved sodium-flouride "activated" toothpaste, and be only drink potassium-fluoride supplimented water. Ignore naturally occuring "healthy" calcium-fluoride that only hippies and tree-huggers advocate!
  • What kind of setup are they testing? Anyone who is willing to spend that kind of money for those processors had better put no less then 2 gigs of ram on those boards!

    Who puts 512 megs on a board like that ? I mean really!
  • I was going to post a link to the mercury news article I read 10 years ago but it would have cost me 3 bucks, in these times that's 3 bucks I wish I had (GWB*)

    Anyways, this was around the time of NT4.0, when I believe apple, IBM, motorola, and MS pooled their resources together to make the power PC chip. One of the things I remember distincly from the article was the flaw in intels RISC chips was their single instruction pipeline to the core, while the PPC chips had 3.

    This same argument/article was made later with the introduction of the G3 series of processors if memory serves me correctly.

    Not intel bashing at all, in fact everything cept the TV and microwave got a intel chip in my house. Just trying to make an interesting point.

    They're taking what is designed as a server processor, what is designed to be optimized for server tasks (such as web page serving which probably scales to multiple CPU and hyperthreading rather efficiently), and benchmarking it on Quake. *shrug* Really, who buys Xeon's for a gaming PC? And, if they do, WHY?

    These are server CPUs and should be benchmarked with server benchmarks.

  • Imagine a beowulf cluster of these! w00t!
  • I remember that Alan Cox wrote a patch to deal with Hyperthreading just a few months ago. One thing it does is to avoid putting two threads on the same hyperthreaded CPU when there are spare physical CPUs, i.e., distinguish between physical and hyperthreaded CPUs.

    Is it in the test kernel?
  • A good analogy to how this works would seem to be segmented downloading. On a fast connection, a segmented download splits up a file into chunks and then opens multiple connections on the same interface, and this tends to utilize more of the available burst bandwidth.

    Despite the error in this, splitting up a program into two threads to run on one processor seems logical. It affords for advances in parallism, which is what processors (even single) like and optimize for. This way if two threads are running, one can be making heavy use of the ALU and the other the FPU, which are physically seperate areas of the processor, instead of one section sitting idle while to other reports 100% usage to the OS. One thread can be loading and moving data into memory while the other does number crunching...AT THE SAME TIME.

    This seems like a very good model, and I can see where it would increase performance by a huge magnitude if implimented on RISC systems, since instructions typically take only a few clock cycles to complete, and most programs are written to perform them sequentially. In hyperthreading, the processor could deal with several instructions at once (like they do already), only the difference would be these wouldn't be JMP guesses or preparing executed code in case of a branch.

    Cool stuff, Intel is in the right direction. It would be interesting if someone would write a program to test an ideal HT condition, like a program with two threads, one doing logic stuff and the other floating point. What would the performance increase be?

  • Die size, die size, die size.

    The larger and more complex the chip, the more it costs to make, and the higher the probability that there will be a defect in a randomly chosen chip. It is more cost effective to make one good cpu than to make two crappy cpus and put them on a single grid array.

    Intel is trying to get back on top in terms of performance, even if it means taking an unelegant approach and making the chips extremely expensive to produce. Note that Intel's fastest offering is barely as fast as the fastest athlon- to accomplish this, they had to move to .13 micron and use a die size that was STILL larger than the athlons. I predict that sledgehammer on EV6 will be much more interesting news than hyperthreading.
  • My company has just bought us developers Dell 530 dual 1.7Ghz Xeon workstations. Nice, you may think, but it feels bloody slow. 15 seconds to compile 50 lines of C, using gcc (using cygwin on NT). Something really seems wrong with this box.

    Not only that, but the idiot who ordered these PC's really overspecced them for development work (mostly editing & compiling), but ordered the bottom-of-the-range monitors for them (17" 60hz @ 1280x1024). People are complaining of eyestrain and headaches. I kept the 19" monitor from my old PC, but I'm so close to quitting this job.


  • Sounds about right given Intel's previous mathematics 'errata' for everything about the 386.



"How many teamsters does it take to screw in a light bulb?" "FIFTEEN!! YOU GOT A PROBLEM WITH THAT?"