Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Intel

Bug in Pentium III Xeon Processors 46

Doug Muth writes "There is an article in Wired that talks about a bug in their Pentium III Xeon chips that causes crashes when "when a system is pushed to its highest performance limit", whatever that is supposed to mean. Fortunately, the bug is only present in two specific variations of the chip, the 550 Mhz versions that have either 512 K or 1 Meg of secondary cache. Intel is also working on a bugfix for the problem. " Furthermore, the bug seems to be only present in Intel-brand motherboards, (Sabre). Intel has stopped shipping the board, but not the chip.
This discussion has been archived. No new comments can be posted.

Bug in Pentium III Xeon Processors

Comments Filter:
  • by Anonymous Coward
    This is funny how people don't know how to read an artcile ... It's clearly an INTEL bug. How do you explain that there are no problems with 512k and 1024k L2 versions of the Xeon 550 and all other Xeons (450/500) ??

    And your 'threads' theory is complete fantasy. ONE thread can put more pressure on the CPU/BUS than 10 other threads doing lighter things.
  • Leave it to Wired to get all the pointy-haired bosses worried, without giving any technical details at all.

    marvelous.

  • PPC and/or Alpha is still looking more and more attractive. . .

    "The number of suckers born each minute doubles every 18 months."
  • oh, it's a noise problem all right.

    Intel's marketing department is making all the noise. Not Intel's engineering department.

    "The number of suckers born each minute doubles every 18 months."
  • Well, for one, it's not NEW. Many (most?) of the first quad-xeons basically melted from excess heat. The heat, of course, came from the chips, but it was the IHV's lack of engineering foresight that created the fatal heat build-up.
    This time, it's Intel's fault. When 8 Xeons are used in their own board(!), too much power is supplied to the chips (VRM malfunction?) thus causing a similar system-meltdown condition.
    I tend to support Intel, rather than bash them, but they've just f***ed a moose with this whole Xeon line. I'll trust the professional benchmarkers who tell me that 1-2MB of cache "makes a difference" in "server applications" with multiple processors. I'll also take their word that that "difference" is worth about $3000/CPU. But if it's really that "worth it", then Intel ought to be able to engineer the damn things not to generate so much heat! Contract your heatsinks to that HP division, or get Alpha (Sun's heatsinks, popular with OC community) to do them, but whatever you do, Intel, do SOMETHING!

    MoNsTeR
  • If you look at the article closely, you will see a statement from Compaq:

    Compaq and Intel have confidently determined that the 'bug' is confined to the Intel Sabre motherboard and there have been no problems with respect to the Compaq design.

    So the flaw is not in the actual CPU but Intel's motherboard. This is why Intel can (and is) still shipping the CPUs, and expects a negligible effect on earnings. (Unlike the whole Pentium bug, where they ate almost $500 million by recalling the faulty chips). So if you don't want to deal with the bug, use a non-Intel mobo (such as Compaq's).
  • > I'm not sure why they ever thought of having the screen go to 640*480, turning blue

    It's going to be pink under W2K, so you can say: I got PSOD.

  • Problem after problem.

    Can AMD take advantage of it in time?

    Hey, Atlon 700's appeared on Price Watch
    yesterday!
  • Yet another article that gives no real details on the actual problem, and causing panic over what may be a rare problem if it only occurs on 8-way SMP systems, and possibly under NT.

    There seem to be a lot of technical articles being published that are written by ingnorant morons, and the number is increasing exponentially.

    I get a free UK magazine weekly that is 90% job adverts, with a few pages of other news just to make it interesting. Nearly every major article seems to be written by a muppet with 2 days experience, which makes the articles pointless since the reader base is IT professionals. A recent artivle on anti-virus policy was pretty hilarious, as the author just didn't have a clue.

    Facts,facts,facts please!
  • I believe that the bug isn't just the intel chip. I think it is NT's lack of running at max capacity. How can they make a bug fix for hardware. I for one am not going to solder a mod chip into my PIII Xeon (if I had one). If it is a software thing they have to make it for every OS that runs on x86. I guess the bug fix will limit the maximum threads that the processor can do in the operating system, further slowing down the chip...
  • you sound like one of the lucky ones who has not had to experience the microsoft...

    there are a few levels of a program crashing in win98/95. One is the windows style 'program has performed an illegal operation' windows, with a nice icon. The next is a larger, two color window that says the same type of thing. In severe instances, you get a more windows2.0-style one: the screen goes to all blue and it tells you 'fatal exception at register blah blah'.

    But anyway, the 'BSOD' is not a bug, it is an archaic microsft error message window. I'm not sure why they ever thought of having the screen go to 640*480, turning blue, and telling you something but at least it doesnt happen often. What is more puzzling is that they preserved this type of crash screen all the way through win98 so far and included it in NT. Win98 i can understand since it comes from the same codebase, but why NT? For nostalgia i guess, and they must realized how much their systems crash so its good to have variety.

    I like it when explorer (not just IE, the core of win98 gui, explorer) crashes and you get first the full color message, then the 2 color one, then the blue screen, sometimes repeating the process. Usually though, you get only one kind or the other.

    As a foot note, after suffering through getting Microsoft networking working with 3 computers and win98, and rebooting the one i installed a nic card in >>12 times to get it configured, im installing linux this weekend.
  • it appears in win98, 95 3.1 and probably before... who knows why they recreated it for NT, they have plenty of other error messages.
  • When will Dell/Compaq/HP begin offering Athlon to the masses?

    Compaq already is, AFAIK...most retailers that sell build-to-order Presarios can get 'em with K7s. I don't know if they've put the K7 in other product lines yet. (I'd be kinda leery about buying a Presario because of all the WinHardware in it. Then again, of the five x86 boxen I have, the only factory-built model is an old IBM PC/XT that someone gave to me for my "old computer collection." Everything else is homebrew.)


  • Well, well, well. Sounds like Intel is having the same troubles are less authorized overclockers :)

    -- Robert
  • A guy in our CAD department here at work just pulled his brand new dual PIII 550 out of the box last night. 512 meg ram, 512k secondary cache, NT. When I got here this morning he was still trying to get it to run...seems every time he put a load on the system (like an IGES translation), BAM! BSOD. How odd.

    Then I read this on /. and, problem solved. He pulled out one of the processors, and now the thing works fine. It was only $500 extra for the useless chip!
  • let's see - this is a problem that only occurs on the fastest chips on some motherboards and in some cache configurations.

    This sounds like a noise problem to me (whether it's the motherboard that's too noisy or the chip that's too sensitive to it or a combination of both is not obvious) I bet they can fix at least some of this by quieting down the Mobo environment .... but it may also point to the reason they're not deploying a direct K7 competitor (they can't they're running on the hairy edge at the moment) and instead are trying to put price pressure on AMD

  • They expect it to affect under 1 million of Intel's units, with the problem being fixed by the end of the fourth quarter.

    When they have a solution they plan on sending out a "work around" or bug fix.

    Interesting enough, this is not their frst problem with the Xeons... when first introduced in 98, many of them had bugs with their accompanying chipsets.
  • My workstation is a Pentium III, but it's only a 450mhz. Still, it crashes pretty much daily. That must be the real reason for BSODs! It's not Microsoft, it's Intel ;-)

    As for the article... um... where was the content?
  • How can you compete fairly? I've never seen such a thing ;) In the buisness world your objective is to kill your opponents while making the stockholders happy. Do you really expect Intel to keep their prices higher then the Athlon for example? That would be like asking Redhat to stop letting people ftp their distribution and start selling it for the same prices as NT ;)

    I must say that I like Intel's offerings a lot, for the last few years they've been the leaders in the market, supplying the highest preforming cpu as well as highest quality. Yes the Athlon is a technicly superior cpu but don't count Intel out yet.
  • I was afraid when I opted for extremely reduced speed, slower multiprocessor support, and paid THOUSANDS of dollars more by buying a P3 chip that I would be without worries! And I was bummed out when I found out the P3 600s were melting and I didn't get one of those. Boy, those people with Athlons with their speed and their stability and their 200MHz bus, they're just stupid, huh? Esperandi
  • While running the PRIMES 18.1 service program as
    part of the GIMPS project (for testing only, of course), I noticed that I was able to create an audible "squeek" coming from the region (board layout) of the Xeon processors on both HP and COMPAQ servers while running (GASP) NT. (yes, I removed the speaker...) The response from the software author indicated that the software may have been causing some sort of feedback between the power regulator board and the actual processor. I'm not so sure now....
    I have been able to reproduce this on PII Xeons as well as PII Xeon boards. No response from HP nor Compaq, yet.

    Gimps info: http://www.entropia.com/ips/
    Team: Subgenius (feel free to join)
  • by Anonymous Coward
    I hear it's an SMP bug and only shows up in 8-way SMP configurations. Something to do with the bus voltages exceeding usual limits, which suggests to me a termination problem, perhaps ringing on the bus lines? Of course, the general ineptitude of computer journalism means that decent technical info is hard to come by... Interesting that this was the same problem Cyrix had years ago with the 686.
  • I have a server system on a K6-2 400 (w/ Asus P5A) that currently has an uptime in excess of 130 days. And, my workstation has had similar uptimes with its k6/233(w/ FIC PA-2011) Even my AMD 486/100 could do that before the power supply in its case died. ;) Granted, the Pentium Pro and Alpha have similar histories, but I haven't been able to witness any of these problems you speak of with my AMD-based systems.

    Btw, I was going over the AMD K7 system building guide (the pdf) the other day, and noticed they had 2 things your friend may be interested in - a recommendation of going with no less than a 300 watt power supply and a video card compatibility list. Since all the reviews I've seen have remarked how stable all of the boards / chips are, I have a feeling it could be one of those causing the problem. If not, it's time to take advantage of a warranty. ;)
  • Does this bug appear under any OS other than NT? Does anyone else thing this sounds more like a bug in NT than in the chip?

    Since it only shows up under high load in an 8-way system there is a large chance that there are almost no non-NT systems configured that way.

    It may end up cauing a BSOD on NT, and a panic on Unixish systems. It may cause just a plain lockup and the reporter assumed anything that crashes is a BSOD. It is easy to imagine the "bug" ends up loading the wrong thing into a cahe line which would upset any OS, or maybe it signals a non-correctable ECC failure which a good OS will panic on, a bad one will ignore (a great one will log the error, and if the page it is on is clean page it in from the backing store again, if dirty kill that pricess, or restart from the last checkpoint...)

  • > The microcode is stored (in encrypted form) in the BIOS flash ROM

    And won't life be exciting when someone cracks the code and turns things over to the kipt scriddies.

  • I *think* this is related: According to C't in Germany, VIA are also having problems with their newest chipset. This is a case for the Babelfish - the article is in German. http://www.heise.de/newsticker/data/ciw-29.09.99-0 01/
  • ...before after that, since the processor fries every time it it "pushed to the limit", it will fry every time someone books up W2K. :)

    "There is no surer way to ruin a good discussion than to contaminate it with the facts."

  • 1. It's faster.

    Actually lots of things are faster than a PIII... from the humble overclocked Celeron to the screaming AMD Athlon, the PIII isn't even 2nd best any more.

    2. It's bang-for-buck.

    Athlon again! PIII must be the most money you can spend on an x86-compatible CPU right now.

    3. You get to upgrade your motherboard if you buy one.

    Same "advantage" if you go Athlon.

    4. Nobody ever got fired for buying Intel.

    Sadly, this is one of the big reasons this also-ran might turn into a leader.

    and the most compelling reason to buy a PIII is...

    5. It has a bigger number on it than the PII.

    I would guess that most /. readers know enough to make that informed choice properly. For a lot of us the choice is possible, because we don't go buying pre-assembled systems from the big names. When will Dell/Compaq/HP begin offering Athlon to the masses? And will Intel FUD triumph, or is this really a turning point for AMD? Unless something new hits the market very soon I see my Celeron 300A being replaced by an Athlon system very soon.
  • There is an instruction that says: Here is a new microcode for you.

    Actually, it is not an instruction, but an MSR write.

    I have a stepping 2 Pentium Pro. I think I could software upgrade it to rev 3 or if it exists 4

    Actually, the steppings represent actual hardware steppings and not microcode versions AFAIK.

    The microcode is stored (in encrypted form) in the BIOS flash ROM which is one of the reasons regular BIOS upgrades are a good idea.

  • Intel has egg on its face yet again because one of its products has a bug in it. This is the best indication that AMD is doing well. Intel will not lose market share so they put out parts as quickly as they can make them and don't test properly. This reminds me of a certain company in Redmond. Let's hope that processor's microcode does not become field upgradable or Intel will start releasing processors that have bugs (not show-stoppers, but minor flaws) are released to the public and we have to wait for the first processor service pack to play the newest version of Quake.
    Or Intel could just start competing with AMD honestly and consumers could benifit greatly. Of course, that doesn't help Intel's stockholders, does it?
  • From www.news.com:

    The flaw crops up when 550-MHz Xeons, with either 512KB or 1MB of secondary cache memory, are used in an eight-processor server with a Saber motherboard, which was designed by Intel. The voltage from the processors in this scenario can exceed the recommended voltage limits and cause a server go to "blue screen," or crash, according to Pijkper.
  • by rew ( 6140 ) <r.e.wolff@BitWizard.nl> on Thursday September 30, 1999 @05:53AM (#1648810) Homepage
    Let's hope that processor's microcode does not become field upgradable or Intel will start releasing processors that have bugs (not show-stoppers, but minor flaws)

    Intel IS shipping processors with field upgradable microcode. Since the Pentium Pro, every processor has upgradeable microcode.

    There is an instruction that says: Here is a new microcode for you. The stuff is encrypted. It verifies an unspecified checksum (i.e. Intel only is allowed to give you new microcode), and then loads the new microcode.

    I have a stepping 2 Pentium Pro. I think I could software upgrade it to rev 3 or if it exists 4 or 5....

    Roger.

  • by DdJ ( 10790 ) on Thursday September 30, 1999 @05:19AM (#1648811) Homepage Journal
    I note that the article says that a complete system crash is also called a "blue screen of death".

    Does this bug appear under any OS other than NT? Does anyone else thing this sounds more like a bug in NT than in the chip?
  • by technos ( 73414 ) on Thursday September 30, 1999 @06:04AM (#1648812) Homepage Journal
    Geezus. I must have been up WAY too late and drank way too much last night.. My head is foggy.. I sat staring at the 'Bug in Pentium' headline and froze.. For a few minutes, I thought y'all had done a flashback to '94.
    Think there's a correlation between the MS release schedule and Intel's bug schedule? There was the buggy 386-40 back in 88-89 when 3.1 came out, there was the P54D divide bug about when Win 95 was due to be released, and now the Xeon has gone screwy just in time for Win2K. There wasn't a chip failure for Windows 98 because it was nothing more than a relabelled copy of Win95.

    Wintel conspiracy? Or is Intel atempting to undermine MS?

The system was down for backups from 5am to 10am last Saturday.

Working...