Major Linux/Athlon CPU bug discovered

Follow Slashdot stories on Twitter

Major Linux/Athlon CPU bug discovered 402

Posted by chrisd on Monday January 21, 2002 @03:49AM from the a-bug-a-day-keeps-alan-at-play dept.

GeorgeFrancisco writes "I recently installed the nVidia drivers so I could play TuxRacer on my Athlon. Problem is it kept inexplicably hanging Linux. Now I know why. The CPU bug affects Athlon/Duron/Athlon MP AGP users. Fortunately there's a way around it, and: "Alan [Cox] is going to try to add some kind of Athlon/AGP CPU bug detection code to the kernel so that it will be able to auto-downgrade to 4K pages when necessary." Read more on the Gentoo Linux site."

This discussion has been archived. No new comments can be posted.

Major Linux/Athlon CPU bug discovered

Load All Comments

Search 402 Comments Log In/Create an Account

Comments Filter:

I noticed too (Score:3, Interesting)

by Fembot ( 442827 ) writes: on Monday January 21, 2002 @03:55AM (#2875158)

I noticed this too, it seems to only affect 3D games, mainly SDL based ones such as armagedtron, but strangly it hasent affected quake 3 at all. Unreal tournamet was affected, but i SWEAR it didnt use to do that.

Share
twitter facebook
- And we were blaming the NVIDIA drivers... (Score:2, Informative)
  
  by npietraniec ( 519210 ) writes:
  
  It really shows up if you use the pre-empt kernel patch. Ever since I added the workaround, things have been pretty solid. (not that it's been that long)
  - - - That presumes that many happen to HAVE other cards (Score:2)
        
        by Svartalf ( 2997 ) writes:
        
        I'm going to have a G400, but that's because I'm moving a card over from my main machine that's a P3-600 until I can afford another card. Most people getting an Athlon are looking for maximal speed (Who isn't?) so they're going with the NVidia cards because they're "fully" supported with all functionalities including T&L supported (The Radeon doesn't have T&L right at the moment and the top of the line one is a different card w/no support right at the moment...). Most of the Athlon crowd is going to have NVidia cards unless they're insistent about having everything Open Sourced. There's nothing wrong with that position, but since the profile indicates that there's not going to be as many people with other cards, how would they see other AGP cards having this problem?
- - Annoyed at something else. (Score:4, Insightful)
    
    by Lemmy Caution ( 8378 ) writes: on Monday January 21, 2002 @12:29PM (#2876452) Homepage
    
    The article notes that AMD has been proclaiming the bug in public for a while.
    What irks me is this: I got hit with this bug. I posted bug reports to Debian, with NVidia, on different forum, report lock-ups in certain open-GL situations. I got generally hand-waving "read the fucking manual" responses.
    As the article notes, this isn't just a problem with AMD. It suggests that there's an ongoing problem with troubleshooting and resolving the sorts of issues that desktop users are going to have in Linux. (And "paying for support" would not have resolved much, would it have? The problem is the lack of coordination, not the lack of money.)
    
    Parent Share
    twitter facebook
Is this the same as the Win2k bug? (Score:4, Interesting)

by sprayNwipe ( 95435 ) writes: on Monday January 21, 2002 @03:55AM (#2875161) Homepage

There was a Win2k bug a while back that did the exact same thing, and you had to install a "LargePageMinimum" patch for it to not crash. Is this the Linux equivilant of that? And if so, how come it has taken so long to surface and fix?

Share
twitter facebook
- Re:Is this the same as the Win2k bug? (Score:5, Funny)
  
  by kilrogg ( 119108 ) writes: on Monday January 21, 2002 @04:04AM (#2875192) Homepage
  
  RTFA, AMD released a patch for w2k but never mentioned anything to the kernel developers.
  Instead of saying "oops, there a hardware bug", they said, "oops, here' a patch for w2k". Looks like none of the kernel developers knew they had to look a w2k bug fixes to find out about hardware bugs.
  
  Parent Share
  twitter facebook
  - Hmm, Win2k needs patched, Linux needs boot option (Score:2)
    
    by gosand ( 234100 ) writes:
    
    I find it rather interesting that for Win2k, you needed to install a patch. For Linux, you can just edit your bootloader with an option, and it does the same thing. Which seems more robust?
    Granted, the Win2k patch was probably just a registry tweak, but which could the average user do more easily? Which operating system gives more information to it's users?
  - Who's Responsibility? (Score:3, Funny)
    
    by jxqvg ( 472961 ) writes:
    
    Where is AMD in any way obligated to call the Kernel Developer Gods whenever they make a mistake? "Oh, Mr. Torvalds, I'm so sorry we made a mistake with our processor. Oh, Mr. Cox, please forgive us. Please don't tell RMS or ESR; we'll fix it, honest!"
    
    Here's the stark truth for you: 1)Money, 2)Userbase.
- Re:Is this the same as the Win2k bug? (Score:4, Redundant)
  
  by Anonymous Coward writes: on Monday January 21, 2002 @04:04AM (#2875195)
  
  It's slashdotted. Here's the article:
  The bad news is that a major Athlon CPU bug has been discovered, and it affects Linux 2.4. Note that this is a bug in the actual CPU itself, and is not a Linux bug. However, it becomes our problem because there are very many semi-broken Athlon/Duron/Athlon MP CPUs out there.
  
  Here are the details. As you may know, x86 systems have traditionally managed memory using 4K pages. However, with the introduction of the Pentium processor, Intel added a new feature called extended paging, which allows 4Mb pages to be used instead. Here's the problem -- many Athlon and Duron CPUs experience memory corruption when extended paging is used in conjunction with AGP. And, this problem hits us because Linux 2.4 kernels compiled with a Pentium-Classic or higher Processor family kernel configuration setting will automatically take advantage of extended paging (for kernel hackers out there, this is the X86_FEATURE_PSE constant defined in include/asm-i386/cpufeature.h.) Fortunately, there is a quick and easy fix for this problem. If you have been experiencing lockups on your Athlon, Duron or Athlon MP system when using AGP video, try passing the mem=nopentium option to your kernel (using GRUB or LILO) at boot-time. This tells Linux to go back to using 4K pages, avoiding this CPU bug. In addition, it should also be possible to avoid this problem by not using AGP on affected systems. As soon as I discovered that this CPU bug existed (which happened, unfortunately, because my CPU has the bug), I informed kernel hacker Andrew Morton of the issue; he put me in touch with Alan Cox. Alan is going to try to add some kind of Athlon/AGP CPU bug detection code to the kernel so that it will be able to auto-downgrade to 4K pages when necessary.
  
  The unfortunate thing about this situation is that AMD and others have known of this bug since September 2000. In fact, AMD's CPG technical marketing division announced this bug on September 21, 2000 in a technical note entitled Microsoft Windows 2000 Patch for AGP Applications on AMD Athlon and AMD Duron Processors (Technical Note TN17 revision 1). And, the kind folks at AMD even created a simple patch for Windows 2000 that disables extended paging by tweaking the registry. However, apparently AMD didn't realize that Linux 2.4 also uses extended paging when the kernel is compiled with a Pentium-Classic or higher Processor family kernel configuration setting. And, it looks like no one in the Linux community noticed that this "Microsoft Windows 2000/AGP Athlon/Duron bug" also applied to Linux 2.4 systems, probably because it was presented by AMD technical marketing as just that -- a Windows 2000-related AGP bug. An unfortunate miscommunication, which has resulted in lots of problems for Athlon, Duron and Athlon MP users. Here's something that's even more unsettling -- consider what kind of Linux users actually use AGP. That's right -- desktop users. And in what area has Linux been struggling? Yes, the desktop. One wonders how many negative desktop Linux experiences have resulted from this unfortunate problem. I don't know if any particular party is to blame for this issue. After all, AMD did prominently announce this bug when it was discovered. But due to an apparently unfortunate series of events, us Linux people never benefitted from this knowledge. But Microsoft Windows 2000 and XP users did. Let's hope that all parties involved can keep things like this from happening in the future.
  
  Parent Share
  twitter facebook
  - Re:Is this the same as the Win2k bug? (Score:2, Flamebait)
    
    by GigsVT ( 208848 ) writes:
    
    If this was discovered almost 2 years ago, then aren't and chips bought in the last couple years bug free?
    - Re:Is this the same as the Win2k bug? (Score:5, Informative)
      
      by DeeKayWon ( 155842 ) writes: on Monday January 21, 2002 @12:51PM (#2876579)
      
      The only revision without the bug is the A5 stepping (CPUID 662) Athlon XP/MP/Mobile Athlon 4. See the Athlon model 4 revision guide [amd.com] and the Athlon model 6 revision guide [amd.com], erratum 16.
      Basically, if you run "cat /proc/cpuinfo" and see these:
      cpu family: 6 model : 6 stepping : 2
      Then you should be safe.
      
      Parent Share
      twitter facebook
      - Re:Is this the same as the Win2k bug? (Score:3, Interesting)
        
        by MrResistor ( 120588 ) writes:
        
        So, it's just the ones with the morgan/palomino core that are safe? Or am I reading this wrong.
        I have to say that this news is somewhat of a relief to me. My Athlon 700 has the bug and I've been going nuts recompiling kernels and nvidia drivers since I first tried to play tuxracer with my little brother christmas eve.
        On the upside, it finally motivated me to explore the guts of Linux a little more... :)
For once Microsoft manged to fix it first (Score:2, Informative)

by bob1000 ( 174146 ) writes:

http://support.microsoft.com/support/kb/articles/Q 270/7/15.ASP [microsoft.com]

Since September of 2000..
- Re:For once Microsoft manged to fix it first (Score:4, Redundant)
  
  by kilrogg ( 119108 ) writes: on Monday January 21, 2002 @04:07AM (#2875202) Homepage
  
  Rather, AMD fixed it for microsoft, they made the w2k patch but didn't release a linux patch.
  
  Parent Share
  twitter facebook
  - Well that is it! (Score:2, Funny)
    
    by Metrollica ( 552191 ) writes:
    
    Boycott AMD!!! [theinquirer.net]
  - - Re:For once Microsoft manged to fix it first (Score:2, Funny)
      
      by Anonymous Coward writes:
      
      The patch is a one line registry change ... open source. The linux developers could have easily incorporated it into the kernel
      I don't know why those damn Linux developers just didn't fire up good ol' /sbin/regedit and fix the Linux registry.
    - Re:For once Microsoft manged to fix it first (Score:2, Troll)
      
      by kilrogg ( 119108 ) writes:
      
      You expect the kernel developers to follows every windows bug and try to figure out if its infact a software or hardware bug? Fact is, AMD made this look like a windows bug, read it for yourself [amd.com](its over to the top right).
      To me, this looks like AMD doesn't give a rats ass about its customers customers who use linux.
      - AthlonXP not affected (Score:2, Funny)
        
        by toofast ( 20646 ) writes:
        
        From AMD's website [amd.com]:
        
        Note: This patch is not needed for Windows XP
        
        Re:AthlonXP not affected (Score:2)
        
        by dougmc ( 70836 ) writes:
        
        Note: This patch is not needed for Windows XP
        
        As much as AMD would like you to think otherwise, Athlon XP != Windows XP.
      - Re:For once Microsoft manged to fix it first (Score:2, Informative)
        
        by Skuto ( 171945 ) writes:
        
        >Why place all the blame on AMD? If you write
        >pentium-optimized code, what's so surprising if it
        >won't work exactly right on an AMD?
        
        It's not _nothing_ _whatsoever_ to do with Pentium optimized code. It's a new feature that both Intel and AMD cpu's support. Or in AMD's case, are supposed to support.
        
        --
        GCP
old news again (Score:2, Interesting)

by Afrosheen ( 42464 ) writes:

I guess it takes awhile to pile through the submissions. This was posted on pclinuxonline.com recently.
- Re:old news again (Score:2, Offtopic)
  
  by cymen ( 8178 ) writes:
  
  I guess it takes awhile to pile through the submissions. This was posted on pclinuxonline.com recently.
  
  Wow... That takes the cake. It's bad enough to bitch about deja vu reposts from /. itself, no need to bitch about reposts of stories at other sites. If you can't see the reasons why please bite yourself. I can't wait until it is unhip to be aloof.
Mirror/cache from Google! (Score:3, Redundant)

by Metrollica ( 552191 ) writes: <m etrollica AT hotmail D0T com> on Monday January 21, 2002 @04:03AM (#2875185) Homepage Journal

Here [google.com] is the cached article.

Thank Google again for this one!

Share
twitter facebook
Another mirror/summary here (Score:3, Informative)

by Afrosheen ( 42464 ) writes: on Monday January 21, 2002 @04:08AM (#2875204)

Karma whoring, here I come. Hopefully this server can withstand a mild slashdotting. Link [tuxreports.com]

Share
twitter facebook
The quick answer: (Score:5, Informative)

by Doctor K ( 79640 ) writes: on Monday January 21, 2002 @04:08AM (#2875206) Homepage

The site seems to be down. However, last week, I contacted nVidia about this problem on my two dual Ahtlon MP workstations (random hangs when OpenGL is invoked). So the quick answer is you can

Boot your system with following option on your kernel command line: "mem=nopentium"

or

Disable AGP in XFree86 config (i.e. Option "NvAGP" "0" in the "Devices" section).

nVidia clued me into the first approach about a week and a half ago. It made my system completely stable. However, there was still some texture flakiness in some OpenGL applications. Since my workstations are number crunchers (and thus Quake FPS don't matter to me), the latter option eliminated both the stability problems and the texture flakiness (at the expense of some graphics speed).

By the way, nVidia mentioned the same issue exists on Win2K / Athlon boxes.

Enjoy,
Kevin

Share
twitter facebook
Simple Workaround (Score:3, Redundant)

by Laven ( 102436 ) writes: on Monday January 21, 2002 @04:10AM (#2875209)

The Gentoo site says a simple workaround where you add "nopentium" to your kernel options at bootup and it will avoid the bug condition. Alan Cox is currently working on adding auto-detection of this bug in the kernel, so we wont have to worry about it soon.

And yes, this is the same Athlon Windows 2000 AGP bug that was discovered and patched last year with that registry key. They just didn't realize that it also effected Linux until now. I now realize that was the cause of my TuxRacer crashes with my nVidia card on my Athlon computer.

Share
twitter facebook
Performance hit? (Score:4, Interesting)

by mojo-raisin ( 223411 ) writes: on Monday January 21, 2002 @04:32AM (#2875227)

So does anyone know how performance is affected from this 4MB->4KB page thing?

Share
twitter facebook
- Re:Performance hit? (Score:3, Informative)
  
  by Sits ( 117492 ) writes:
  
  You may want to take a look at the benchmarks posted later [slashdot.org].
- what's the point of 4MB pages? (Score:3, Insightful)
  
  by RelliK ( 4466 ) writes:
  
  First, as has already been pointed out, there is no performance hit.
  
  But I still did not get the answer to my question. What is the purpose of having 4MB pages in the first place? It is inconceivable that an OS would use 4MB pages exclusively. The internal fragmentation would be enourmous.
  
  To give you an analogy, think of what would happen if your file system used 4MB blocks. When you create a file, space would be allocated 4MB at a time so a 1 byte file would waste (4MB - 1byte) of disk space; (4MB + 1byte) file would take up two blocks, also wasting (4MB - 1byte) of disk space. On average, each file wastes 1/2 of the last block. Similarly, each process wastes on average 1/2 of the last page. That's not a problem if the pages are 4KB in size, but with 4MB pages there's lots of space wasted. That's like throwing away paging altogether.
  
  So, I ask again, what is the point of having 4MB pages?
- - Re:Performance hit? (Score:4, Interesting)
    
    by larien ( 5608 ) writes: on Monday January 21, 2002 @05:21AM (#2875353) Homepage Journal
    
    That's a rather naive assumption; it assumes that a 4KB page takes the same amount of time to move as a 4MB page. Admittedly, there will be 1024 times as much loop activity in order to move 4MB, but that probably isn't the real bottleneck, which would be memory/disk bandwidth. Also, you may gain some efficiency if you only want to move say 512KB.
    In short, you're better off with 4MB pages if it's stable, but I don't know by how much. I guess some benchmarks would be easy enough to do; e.g. run Q3A with and without the mem= options.
    
    Parent Share
    twitter facebook
    - Re:Performance hit? (Score:2, Troll)
      
      by themassiah ( 80330 ) writes:
      
      I, personally, think it's sad when a video game's measure of frames per second becomes a benchmark. At least re-index a database or something ;)
      - Re:Performance hit? (Score:2)
        
        by larien ( 5608 ) writes:
        
        It's fairly standard, like it or not! It also throws a fair bit of data around, which should give an indication of performance. In any case, it's probably what most desktop users are concerned with!
        The DB reindex is a good test of paging as well, however.
      - Re:Performance hit? (Score:2)
        
        by hearingaid ( 216439 ) writes:
        
        Video games are really processor-dependent. Most other applications are hard drive-dependent to some extent or another. Indexing a database is really a way to test the speed of your hard drive for any DB of significant size (nobody keeps a 500GB DB in RAM :)
        The only other application I can think of that's comparatively CPU-dependent is raytracers and the like, and the problem with using them as benchmarks is that the length of time they take to produce a picture will obviously depend on the complexity of the picture. Q3/UT/etc. generate pictures of roughly fixed complexity, saving you the trouble, and also do so in a time-optimized kind of way (while raytracers tend to be more optimized towards producing beautiful results).
  - Re:Performance hit? (Score:5, Interesting)
    
    by andrewgaul ( 25829 ) writes: on Monday January 21, 2002 @05:23AM (#2875359) Homepage
    
    The performance hit for using the smaller pages is mostly unrelated to paging. When a CPU loads an virtual address (all addressing in "protected mode" is virtual), there is a translation to a physical address before data can be accessed. This table is stored in memory and the CPU breaks into kernel mode to do the translation. To avoid this cost, there is a cache of translations (managed by the kernel) in the Translation Look-aside Buffer (TLB). Most of the entries in this cache are for 4kb pages, but there are a few 4mb pages which are generally used for kernel memory (I am unsure if any OSes use the big pages for user programs).
    
    That said, there should be a modest performance hit. Bigger pages can store more data, which results in fewer TLB misses. Hopefully someone will post benchmarks.
    
    Parent Share
    twitter facebook
    - Re:Performance hit? (Score:2)
      
      by frleong ( 241095 ) writes:
      
      See this link to read how folks at MSDN describes LargePageMinimum, the fix to the Athlon/AGP bug:
      Kernel improvements of Windows XP [microsoft.com]
Is this present in Athlon optimized kernels? (Score:2, Interesting)

by victwenty ( 451152 ) writes:

from the article: And, this problem hits us because Linux 2.4 kernels compiled with a Pentium-Classic or higher Processor family kernel configuration setting will automatically take advantage of extended paging
so the question is, if I configure my kernel for the K7 family, do I need to pass the kernel "mem=nopentium" or is this the default?
- Re:Is this present in Athlon optimized kernels? (Score:2, Informative)
  
  by Sits ( 117492 ) writes:
  
  Almost definitely not. It sounds like the existence of this bug was not known until recently and K7 options almost definitely enable all memory enhancements.
Nvidia + AGP + Irongate + Athlon (Score:4, Interesting)

by hack0rama ( 253610 ) writes: on Monday January 21, 2002 @04:53AM (#2875271) Homepage Journal

Nvdia drivers forces AGP to 1x due to corruptions caused by AMD Irongate chipset signal integrity [ Mentioned at the README [205.158.109.140] for Nvidia 1.0-2313 Drivers ]

This newly discovered memory corruption with Athlon + AGP, is it contributing to the signal integrity of the Irongate ? Or is it a separate bug ?

Anyway this makes AMD look very bad in my view. There is a bug in the CPU and their chipset screws up my AGP to 1x. Sigh.

Share
twitter facebook
Should AMD do the right thing? (Score:2, Insightful)

by NanoGator ( 522640 ) writes:

I should start by saying I haven't read the article yet, can't get to it. *hopes the /. traffic dies down soon...*

If it is a defect in the processor, I wonder if AMD will replace my existing processor. It may not seem like all that big of deal to most people here at Slashdot, but as a 3D artist I am *dependent* on OpenGL.

Don't get me wrong, I'm not having this problem now. (I'm not a Linux user.) But when I built my Athlon I had to install a patch for a similar type of problem in order to get the machine to work. At what point do we say "it's no longer ok to work around a CPU bug"?

If Intel has one set of bugs in their processors, and AMD has another, that divides the market. Software companies shouldn't have to put the effort into scrutinizing their code based on which CPU they are on, it's bad enough they are trying to optimize for one or the other. What happens when they get used to the workaround, but then it gets fixed? Worse yet, what happens when a company says "I'm sick of this, I'm only supporting one processor."

So it's not so much that I think AMD should replace the processors with this specific bug, but I think we should be vigilent in not allowing them to let errors like that run rampant.
- Re:Should AMD do the right thing? (Score:3, Informative)
  
  by Linux Freak ( 18608 ) writes:
  Heh, microcode bugs go back, WAYYYY back as far as microprocessors do themselves.
  
  http://www.computerhope.com/help/cpu.htm#05 [computerhope.com]
  
  http://www.tridwr.demon.co.uk/acorn/processors.htm l [demon.co.uk]
  
  http://www.mackido.com/History/brief_history.html [mackido.com]
  
  Shit happens. Work around it. ;-)
  - Re:Should AMD do the right thing? (Score:4, Informative)
    
    by Eric Smith ( 4379 ) writes: on Monday January 21, 2002 @06:11AM (#2875467) Homepage Journal
    
    That third article about the supposed "HCF" instruction on the 4004 is completely and utter BS. None of the instructions on the 4004 will cause it to burn up, even on the earliest production parts.
    Several processors had self-test instructions known as "HCF". The 6800 family and the 6502 had such instructions. They caused the processor to start fetching consecutive locations, thus continuously incrementing the address bus. Didn't damage the processor, even if you left it running that way. The "Catch Fire" was a figurative description of what was happening on the address bus, nothing more.
    On the original NMOS 6502, about 13 of the undefined opcodes had this effect. This was the most common cause of computer lockups if the code went into the weeds.
    On some of the later 6800 family members, the test instructions were actually documented, but Motorola's published description did not include any mnemonmic for them.
    
    Parent Share
    twitter facebook
    - HCF is a reference to an old IBM joke. (Score:3, Interesting)
      
      by Ungrounded Lightning ( 62228 ) writes:
      
      That third article about the supposed "HCF" instruction on the 4004 is completely and utter BS. None of the instructions on the 4004 will cause it to burn up, even on the earliest production parts.
      
      When the IBM System 360 series came out it had a large number of new opcodes (as compared with the 70x/70xx series). These were the days of CISC (Complex Instruction Set Computers), and the 360 really lived up to the name. It gave over a large amount of its word space to opcodes and opcode extensions, so it had a VERY large potential opcode space. Much of it was unpopulated, but some was populated with undocumented instructions. Further, the machine was microcoded, and the microcode was loaded when the machine powered up. (That's what floppy disks were invented for.) So the company could write new opcodes and add them later.
      
      Of course the new machine with the ENORMOUS list of opcodes and (true) rumors of hidden undocumented opcodes quickly lead to the circulation of a humorous list of perhaps 20ish additional "new undocumented opcodes". Things like XOE (Execute Operator Immediately), EK (Electrify Keyboard), SSJ (Select Stacker and Jam), BLNK (Blink Lights), WHR (Whirr), etc. The crown jewel of this list was HCF (Halt and Catch Fire).
      
      While this list was still funny Motorola released the 6800 single-chip microprocessor, predecessor to 650x knockoff that formed the core of the first Apple computers. To ease chip testing, the all-ones opcode threw the chip into a test mode, where it continuously incremented the program counter and performed memory reads. This wiggled all the address lines and most of the control lines, letting you know if the chip was alive and bonded.
      
      Of course they didn't tell you about it. And of course the only way out was hard reset. And of course a jump to an unpopulated region of the address space (i.e. most of it) would leave the bus floating and generate 0xFF. And of course jumping into random data or uninitialized memory would also quickly get you an 0xFF or jump you off into unpopulated address space. So the typical behavior for a program bug was to lock up the processor beyond the ability of a debugger to function.
      
      (Hell: I had one of the first round of solder-it-yourself evaluation kits, bent a pin on the debugger ROM putting it into the socket, and ended up with a board that booted into the test state. Was starving student and it took a couple days to get access to test equipment to find out what was wrong.)
      
      So of course programmers, once they found out about the instruction that hung the chip in a mode where it "twiddled its thumbs at maximum speed" and got a bit warmer than usual, and couldn't get out of the mode except by hard reset, quickly christened the opcode "Halt and Catch Fire". And this became the generic term for get-stuck-in-a-test-mode instructions on microprocessors, until the chip manufacturers finally came to their senses and stopped putting such instructions into instruction sets.
- Re:Should AMD do the right thing? (Score:5, Interesting)
  
  by flatrock ( 79357 ) writes: on Monday January 21, 2002 @10:23AM (#2875958)
  
  First of all, this bug is not that significant performance wise. Very little software is going to use 4 MB pages. I don't think you even have an option of allocating memory with 4 MB pages in user space. This appears to be an issue with being able to optimise drivers, however, if AMD's processors can't do this, and Intel's can, why don't we see Intel's processors greatly outperforming AMD's in Win2k? This is a minor bug, and it's easily worked around without patching the kernel in both Win2k and Linux.
  
  The processors are basicly all their Athlon and Duron processors. For AMD or any chip maker to replace chips with bugs in them is VERY expensive. They already have a low profit margin. Replacing all "defective" Athlon and Duron processors would simply bankrupt AMD. Realisticly, all complex software or hardware has bugs. Bugs in hardware are much more difficult and expensive to fix. The truely significant hardware bugs are usually found early in testing. Other bugs are fixed in software, usually in the system BIOS, but sometimes in the OS code. This isn't something new. It's pretty much always been this way. Why has it been this way? Because no one wants to pay the outlandish prices that would result from trying to make hardware perfect. It costs a tremendous amount of money to reroll a processor. It's not as simple as making a quick code change and recompiling software. THERE WILL ALWAYS BE BUGS IN PROCESSORS! A truely significant bug like the Pentium floating point bug needs to be fixed in the hardware, and that one was even significant enough to deserve a recall of the processor. This bug is simple to work around, and isn't truely a significant problem.
  
  The question you asked in the subject is "Should AMD do the right thing?" The answer is yes, they should correct their Technology Bulletin to actually say what the processor bug is, rather than just say here's a workaround to a bug that effects Win2k.
  
  I'm really surprsed that someone at NVidia didn't pass this on to Linux kernel developers much sooner, since people at that company seem to have been aware of this for some time.
  
  Parent Share
  twitter facebook
Athlon bug, and NVIDIA drivers (Score:3, Interesting)

by Rohan427 ( 521859 ) writes: on Monday January 21, 2002 @05:24AM (#2875363)

I have 2 Athlon systems, a dual Thundirbird 1.4GHz (Tyan Thunder K7) and a single Thunderbird 1.4GHz (Asus A7V133). The former runs a GeForce 3 and kernel 2.4.17, the later TNT2 and RH 7.2 (kernel 2.4.9 I believe). Both systems run semi-custom NVidia drivers (release 2313). By semi-custom, I mean I tweaked them to use SBA, the NVIDIA AGP driver (NOT agpgart) and to run in 4x mode. The later has never had a problem, the former (the dual) had some problems until kernel 2.4.14.

The problems I had were frequent lockups with everything X, especially Q3A and Tribes 2. Some experimenting proved what worked and what didn't, and here's what I found:

agpgart never worked worth a damn even with kernel 2.4.17, despite several attempts by me to make it work (I don't maintain it, so I gave up on messing with it). Earlier NVIDIA drivers were less stable, but the latest is great (although it does not support FW, which blows). Tweaking the NVIDIA driver to use SBA and it's own AGP driver instead of agpgart, along with kernel 2.4.14 - 2.4.17 makes for a very stable and fast system. Older kernels just did not work worth a damn whenever I enabled DMA on my IDE drive - they locked every time. These newer kernels don't exhibit this problem, and the NVIDIA driver works nicely with all 3D games as well as 3D development tools like Blender.

My kernels have always been compiled as Athlon kernels as well. The bottom line is: don't blame this bug and/or the NVIDIA driver if your system is unstable and/or slow. There are other things at work, and in my case I seem to have found them all.

- Rohan

Share
twitter facebook
- Re:Athlon bug, and NVIDIA drivers (Score:2)
  
  by ZaMoose ( 24734 ) writes:
  
  Have you made this tweak available? How difficult is it to perform?
  
  I've got a dual Athlon MP 1900+ machine from Alienware coming in for work and I'd like to get it running like a dream, if at all possible.
How-To: lilo workaround (Score:4, Redundant)

by Anonymous Coward writes: on Monday January 21, 2002 @05:32AM (#2875385)

If you're using lilo, and just want to apply the workaround quickly, edit /etc/lilo.conf.

Before the first image= line, insert the line:

append="mem=nopentium"

Share
twitter facebook
Does this happen if kernel compiled for K7? (Score:4, Interesting)

by Nicolas MONNET ( 4727 ) writes: <nicoaltiva@gm a i l.com> on Monday January 21, 2002 @05:35AM (#2875394) Journal

The article says it happens when the kernel is compiled for Pentium processors; but does this happen if the kernel is compiled for a K7?

By the way, I had to shelve my nVidia card a couple months ago because of this ... I have an Athlon and it kept hard freezing. The bug doesn't happen with a Voodoo card.

Share
twitter facebook
- Optimised kernels still buggy (Score:2, Informative)
  
  by Sits ( 117492 ) writes:
  
  I've posted this elsewhere but to clarify - it looks like this will still happen regardless of which processor you have selected (even i386!). This is because the test for whether your processor does pse seems to be run on startup (I think it's done by arch/i386/mm/init.c __init pagetable_init).
  
  As an aside, as far as I can tell the only (extra) things that optimising a kernel for a K7 seems to set are gcc options (someone please correct me if I'm wrong).
- Re:Does this happen if kernel compiled for K7? (Score:2)
  
  by DeeKayWon ( 155842 ) writes:
  
  I assume so. Since PSE is supported in Athlons I would think the kernel people would enable it for a K7 compile.
  
  I would think that only people who compile their own kernels and those who use Mandrake would be affected by this since pretty much everyone else compiles for 386, which would turn off the use of the PSE capability.
The equivalent Win2k bug fix (Score:3, Informative)

by LadyLucky ( 546115 ) writes: on Monday January 21, 2002 @05:42AM (#2875406) Homepage

can be found here [amd.com]
Funny, I knew something was wrong...

Share
twitter facebook
Buggy Features (Score:5, Funny)

by Perdo ( 151843 ) writes: on Monday January 21, 2002 @05:43AM (#2875408) Homepage Journal

MShaft: "Not-a-bug-it's-a-feature"

Intel: "Not a bug it's erratum."

VIA: "We slowed it down to keep it cool."

Nvidia: "That was a leak! We are not doing public driver beta testing!"

ATI "Who the hell plays Quack3?"

AMD "the patch is here [amd.com]"

Share
twitter facebook
The guys who found the bug... (Score:2, Funny)

by GdoL ( 460833 ) writes:

...seems they work for Intel. Their description was:
"It's a major bug. We don't know how it happend. We will ask marketing. We don't remember ever sell that chip.".

:-))
Using Test Suites to Validate the Linux Kernel (Score:5, Informative)

by goingware ( 85213 ) writes: on Monday January 21, 2002 @06:03AM (#2875452) Homepage

Let me take this opportunity to plug Using Test Suites to Validate the Linux Kernel [sunsite.dk].
Thank you for your attention.

Share
twitter facebook
Quake 3 benchmarks (Score:5, Informative)

by Sits ( 117492 ) writes: on Monday January 21, 2002 @06:05AM (#2875454) Homepage Journal

Quake 3 demo was run with \timedemo 1 and \demo DEMO001 . Each test was run three times. The system load average was < 0.5 before Quake 3 was run.

Without mem=nopentium
FPS = 79.4 (79.4, 79.4, 79.4)

With mem=nopentium
FPS = 79.2 (79.1, 79.3, 79.2)

System tested:
Athlon 850, 384MB RAM, Geforce 1 DDR, VIA KT133 Chipset
Athlon/Duron/K7 optimised 2.4.17 kernel (optimising the kernel above pentium makes very little difference though)
NVidia 1.0-2313 video drivers using agpgart
Mandrake 8.0

Quake 3 settings
Texture depth = 16 bits
Colour depth = 16 bits
Geometric detail = High
Texture detail = High
Dynamic lights = On
Video mode = 1024x768

Looks like there is a difference but it's very slight (0.003%) but my benchmarks aren't very scientific. Either way, if there is an improvement in stability this tradeoff is easily worth it. Here's hoping that you don't run linux just for it's Quake 3 scores [theregister.co.uk] though...

Share
twitter facebook
I'm not sure what to think (Score:2, Interesting)

by hyehye ( 451759 ) writes:

I just got a new box, Athlon 1.2GHz... Asus a7a266 mainboard... nice little box for general usage. Soon as I finish moving, I'll get cable modem back and stop using mom's AOL, and I'll go back to Linux. But now I see this, and I'm eyeing my AGP card, and wondering. AMD has earned a lot of respect from me in the last couple years, as I've found the Athlons to be simply the finest x86 CPU's I've ever got my hands on, at great prices with very reasonable motherboards/chipsets as well. Now this. I'm not sure. Yeah, it's an engineering mistake, but I'm not clear on how AMD is handling it, and I hope they don't disappoint me. Sure, you can do a workaround - but as others have asked, what's the story on the performance hit? What about AMD working with the kernel folks to find another, better solution? Or maybe AMD could consider offering serious discounts on new, un-flawed CPU's, for those who are already eyeing upgrades?
- Re:I'm not sure what to think (Score:2)
  
  by hyehye ( 451759 ) writes:
  
  Also...
  
  AMD should seriously consider its response to this. The Linux community is well-informed, in general, and has been much quicker in moving to AMD than Windows users (mostly because Windows users are mainly Dell/Gateway/Compaq/Etc customers..), and AMD would do well to make attempts to avoid disappointing us.
I had a stroll through AMD erratas (Score:3, Interesting)

by Anonymous Coward writes: on Monday January 21, 2002 @06:46AM (#2875530)

If I read the various PDFs correctly on AMDs site, all Athlons
except model 1 (the very first K7 since it didn't have PSE) are affected,
except the latest revision A5 (cpuid 662) of the Athlon XP, i.e. A0/660 and
A2/661 are affected as well (similarly all 64x Thunderbirds etc.).
(there was a model 1, 2, 4 and 6 Athlon, with 6 being XP)

Some or all Durons might be affected too, but I didn't look at that closely.

The above hinges on whether this is the correct bug description, feel free
to flame the anonymous coward if this has got nothing to do with it :)

"16 INVLPG Instruction Does Not Flush Entire Four-Megabyte Page Properly with Certain Linear Addresses

Normal Specified Operation. After executing an INVLPG instruction the TLB should not contain any
translations for any part of the page frame associated with the designated logical address.

Non-conformance. When the logical address designated by the INVLPG instruction is mapped by a 4-MB
page mapping and LA[21] is equal to one it is possible that the TLB will still retain translations after
the instruction has finished executing.

Potential Effect on System. The residual data in the TLB can result in unexpected data access to stale or
invalid pages of memory.

Suggested Workaround. When using the INVLPG instruction in association with a page that is mapped via
a 4-MB page translation, always clear bit 21."

(page 7 from Athlon Model 6 revision sheet [amd.com])

Share
twitter facebook
Alternate, faster? workaround (Score:5, Interesting)

by jquirke ( 473496 ) writes: on Monday January 21, 2002 @07:56AM (#2875638)

The current workaround gets around this problem by disabling 4M (2M?) pages (PSE). Hence we go back to 4K pages, and mapping large slabs of VM is a little slower and wastes memory (we need another Page table for each slab of 4M) and obviously more TLB misses/space wasted, because to touch the whole 4M region, the CPU needs to do up to 1024 page table lookups instead of 1.

As discussed this may have performance implications.

According to the AMD docs, the problem is only when flushing TLB entries with INVLPG and the page is a 4M page, _and_ the virtual address's bit 21 is set (which does not affect the 4M block of memory the address is in - eg: 0x400000 (2^22) vs 0x600000 (2^22|2^21) are both in the second 4M block).

Hence, when invlpg'ing a VA we just need to INVLPG(address&~(1 (leftshift) 21)). This only requires a single ANDL instruction. But we need to distinguish a 4M page first though, so I don't know?

Heck maybe we should just do it the FreeBSD way and recursively map the Pagedir :-)

Any ideas? Will this work?

--JQuirke

Share
twitter facebook
Other Hackers did it better . . . (Score:5, Informative)

by Jeff Kelly ( 309129 ) writes: on Monday January 21, 2002 @08:04AM (#2875661)

Here is a Posting from Terry Lambert on the FreeBSD -stable Mailing List regarding this "Bug".
Maybe it sheds some light on this issue.

> Recently I found Linux 2.4 kernel is affected by the
> bug of extended paging in AMD Athlon through the
> following link. I don't know if FreeBSD is also
> affected.
>
> http://linuxtoday.com/news_story.php3?ltsn=2002-01 -21-001-20-NW-KN

I am well aware of this bug.

It does not affect FreeBSD, which only uses 4M pages for
the first 4M of the kernel itself.

I've worked on code that enables 4M pages on other memory
used in FreeBSD, that had this problem, but only if you
were really stupid in your allocation mechanism.

There's a workaround for this problem which is fairly
trivial to implement in software, and should probably be
done when 4M pages are enabled, if you are using an Athlon,
and are adding 4M pages.
[...]
In any case, this will not be a problem for FreeBSD, and is
only a problem for Linux because of the strange way they
initialize things.

Share
twitter facebook
- Re:Other Hackers did it better . . . (Score:2, Informative)
  
  by jelle ( 14827 ) writes:
  
  When an OS doesn't use a CPU feature (4M pages, using it just for the kernel doesn't count), that doesn't make the hacker better, it makes the OS not taking advantage of all CPU features (and therefore not running into the related CPU bugs...).
  
  So this guy tried to do 4M pages, it didn't work well (he encountered the bug), and decided not to implement 4M pages at all. And for Linux, the guys just happened to implement 4M pages long before AMD created the processors with the bug.
  
  Different history, all good hackers.
  - Re:Other Hackers did it better . . . (Score:2, Informative)
    
    by Jeff Kelly ( 309129 ) writes:
    
    When an OS doesn't use a CPU feature (4M pages, using it just for the kernel doesn't count), that doesn't make the hacker better, it makes the OS not taking advantage of all CPU features (and therefore not running into the related CPU bugs...).
    
    Read again. The Posting states that "I've worked on code that enables 4M pages on other memory
    used in FreeBSD, that had this problem, but only if you
    were really stupid in your allocation mechanism."
    
    He encountered the Problem in his _own_ code and fixed it there. He also states: "There's a workaround for this problem which is fairly
    trivial to implement in software, and should probably be
    done when 4M pages are enabled, if you are using an Athlon,
    and are adding 4M pages." He very clearly states that 4M pages are not currently supported in FreeBSD (should be in 4.5) but that a workaround exists. (And it is _not_ deactivating the 4M paging as in linux).
    
    So although they are not affected by the Bug because they do not use that particular feature at least they know that it exists and they do have a workaround ready _now_ so that by the time this feature is implemented this bug will not cause any troubles. Which is more than I can say about the Linux hackers, which don't even bother to read the docs provided by AMD.
    - Not a documented errata (Score:2)
      
      by himi ( 29186 ) writes:
      
      It's rather hard to read non-existent documentation. This bug isn't listed in the AMD K7 errata, which is why it wasn't found - the only 'documentation' for this is the Win2k patch that AMD provided.
      
      Linux and *BSD just do things differently: it's not a matter of one set of hackers being better than the other.
      
      himi
grub workaround (Score:3, Redundant)

by chongo ( 113839 ) writes: on Monday January 21, 2002 @08:17AM (#2875685) Homepage Journal

If you're using grub and want a quick but effective workaround, then edit your grub.conf file, which is usually under /boot/grub.conf or /etc. On the end of any line that begins with the word kernel add:

mem=nopentium

For good measure, re-install your grub config by running:

/sbin/grub-install /dev/hda

Where /dev/hda is your boot disk. For most PC users with IDE drives, it will be /dev/hda .
Last, just reboot.

Share
twitter facebook
bad form AMD, realy bad form (Score:3, Troll)

by budgenator ( 254554 ) writes: on Monday January 21, 2002 @09:24AM (#2875792) Journal

A lot of your market share is there only because we who use Linux® have stuck by you. We have been ridiculed because we are using an "off-brand" processor, we've rationalized a way thermal problem's and fragile cores to get the benefit of more bang for the buck. We have suffered through inadequate compiler support, until your market share has grown to the point where an honest push onto the main-stream desktop is possible.

And what do we get for it, no real support, write your own fix, no; that we can, and often do. What we got was forgotten, you didn't even tell us. We are used to and demand full disclosure, and in real time. Linix people hang their dirty laundry out in public to give everyone a fair and equal chance at a fix.

We're often treated as a minority because we are, but treat us as a second class minority at our own peril. In short don't ever let the marketing weenies convince you to hide something from us; if we wanted to be treated that way we would use Win/Intel products

Share
twitter facebook
been fixed? (Score:2)

by csbruce ( 39509 ) writes:

So, if it was discovered over a year ago, was this hardware bug ever fixed? We bought a dual-athlon 1.53-GHz (1900+?) machine recently; do these processors still have the bug?
Wow... (Score:2)

by Greyfox ( 87712 ) writes:

I'd been noticing this for ages -- just about anything that does GL will almost always hang my system (Oddly enough, I have never noticed this with Tuxracer.)
I'd always assumed that it was just a crappy AGP implementation on my no-name motherboard, as I'd been following the Mesa/GLX groups for a while and hadn't seen the problem mentioned all that much. It's nice that there's a relatively easy fix for the problem. Maybe now I can get back into Tribes2 again :-)
Please explain (Score:2)

by RelliK ( 4466 ) writes:

Could somebody with more knowledge explain why you need 4MB pages in the first place? Pages are supposed to be small for a reason. With 4MB pages, internal fragmentation would go through the roof. It's almost like not having paging at all. I don't understand why this option is even available and used.
AMD Rev A5/CPUID 662 (Score:2, Informative)

by lanalyst ( 221985 ) writes:

Recently purchased 2 XP 1600+s (1 in Dec and 1 in Jan) - both indicate they are Rev A5 (CPUID 662) and do not have the INVLPG bug according to AMD's errata sheet.
SO that's why! (Score:2)

by Chanc_Gorkon ( 94133 ) writes:

The other day I left my Dual Boot system (with a Nvidia GeForce 2 MX 400 and NVIDIA drivers) booted into Mandrake Linux for most of the day and it was fine. Of course I was at the system for most of the time. I decided to go to the store and when I came back the system was locked tighter then a drum. No big deal since I run ext3 for the file system. Rebooted and it was fine. How would one add this option to a GRUB bootloader?? I bet if I add it, the screensaver won't lock (Open GL screensaver.....). I don't play a whole lot of games so the texture flakiness would not bother me.
This is flaw in how Linux is (not) managed (Score:2, Insightful)

by Anonymous Coward writes:

First off, yes, this is a rather major bug.

But is it enough to warrant not buying the processor or flaming AMD???? Hardly!

EVERY piece of hardware out there has some bug in it! Have any of you ever sat down and read the list of errata on Intel parts and the list of how many flaws are fixed in each stepping? The list of bugs fixed over the life of the P3/Celeron core is a rather lengthy document to say the least.

And I can't really fault AMD at all on this one other than that they HAD a bug...for Win2K etc, they released a fix/patch in very short order and notified everyone rather quickly.

And don't forget this was back in 2000! What version was the norm for deployed kernels back then (over a year ago!)???

From what I gather, the 4Mb AGP paging didn't show up until kernel 2.4 builds -- which I do not think were final at that time. Regardless, I feel the Linux kernel community should have been a bit more proactive in noting a DOCUMENTED bug and correcting for it.

Regardless, this bug in no way affects whether or not I would buy an Athlon/Duron. It is basically trivial to workaround and results in almost no performance loss. In essence, my Athlon XP 1900+ with the fix will still beat the crap of most P4 2Ghz machines in 90% of all applications (for half the price).

This is basically a failing of the entire Linux concept more than a failing of AMD.
Is there any central authority who regularly checks AMD, Intel, Via, Transmeta, etc. erratum sheets for bugs that might potentially affect the kernel? Based on this, I strongly doubt it.

Don't get me wrong, Linux is a great OS, but the lack of centralized control and build management is starting to cause problems. There are so many changes to different modules that version dependencies crop up all the time and no one is managing them.

I am not a big fan of Microshaft myself, but I would put money that they have at least one or two people whose job is to do nothing but monitor the processor manufacturers erratums to make sure no major problems submarine the sales of Windows XP! Bill Gates may be many things, but stupid is not one of them. H*ll, if I was Microshaft, I could have a marketing field day on this one -- it could be a very persuasive argument to lots of upper management types as to why Windows is better than Linux.

Is this bug a problem? Yes.

Was the original problem AMD's? Yes.

Did they address it and notify people? Yes.

Did anyone in the Linux community actually notice? NO

Regardless, any bug that can be worked around this easily is not THAT big a deal people...but it does point out some serious flaws in how Linux kernel development is managed. If Linux is to survive, some order had better start arising out of the spiraling chaos!

So to sum up the appropriate response to this bug: LEARN FROM YOUR MISTAKES AND GET OVER IT!!!!!!!!

"The sky is falling! The sky is falling!"

sheesh...
I just want to know (Score:2)

by Vicegrip ( 82853 ) writes:

- what proccessor rev its fixed in. I'm wanting to buy a new machine, it's still gonna be AMD, but I don't want a processor with that bug, as I am a big gamer.
- how to tell if my processor is affected. (I'd rather not have to wait for my system to crash to find out)
curious (Score:2)

by chompz ( 180011 ) writes:

I think the k62 had this problem as well. Anyone know about that?
Does this problem occur in the 2.2 kernel series? (Score:2)

by jonabbey ( 2498 ) writes:

I've seen a number of mysterious X freezes in XFree86 4.1.0 and earlier on my Athlon/GeForce2MX system with NVidia kernel/X drivers. Most often the X server just seems to lock up when I'm doing nothing in particular. Occasionally I've had the whole system freeze during 3d gaming.

This is all with Linux 2.2.18. Has anyone commented for sure on this bug in the 2.2 series?
- Don't think so (Score:2, Informative)
  
  by Metrollica ( 552191 ) writes:
  
  I don't think so. AMD reverse engineered the x86 and made their own implementation without Intel's crap in it.
  
  AMD's version of the x86 that is in the Athlon and the Duron runs faster than Intel's chips because of this reverse engineering.
  
  This bug could be a problem of reverse engineering the x86. It doesn't say Intel's chips have the problem.
  - - in response to mr troll (Score:2, Flamebait)
      
      by Metrollica ( 552191 ) writes:
      
      you are only right on this:
      
      they add their own tech too, which is why they get different results.
      
      quote [anl.gov]
      
      Now, the Athlon processor is made by a rival company, AMD. They have
      basically reverse engineered the Intel processors and tried to make a
      processor that operates just like Intel's processors, and then sell them
      cheaper than Intel does.
      
      This makes it a little more difficult to compare them to the Pentium
      processors. Some things the AMD Athlon actually does faster than a Pentium
      III, some things it does a little slower, and some things it can't do at
      all, while other things the Intel can't do, the Athlon does do.
      
      quote [geocities.com]
      
      Had AMD had a design ready when Intel released their Pentium, their market share
      wouldn't have dropped to 10%. In the days of the 286, 386, and 486, AMD, Cyrix, and other "clones"
      reverse-engineered the Intel chips. In a sense, it was Intel's design (with maybe a few improvments),
      but it was reverse-engineered so it did not violate patents.
      
      quote [pbs.org]
      
      But nothing lasts forever. The companies that had built Intel chips under license eventually reverse-engineered the chips and built them license-free. Intel copycats including Advanced Micro Devices (AMD) and Cyrix (a division of National Semiconductor) used the courts to validate their right to copy Intel's chip architectures. And PC manufacturers like Compaq and IBM used these clone chips as a weapon to force Intel prices down. Now the best way for Intel to stay ahead is to simply run faster. Running faster means shrinking product cycles from three years to 18 months by running parallel product development teams and spending more money faster than the other guys. Since Intel has more money to spend, this keeps them in command, but shorter product cycles mean less time to recoup R&D expenses. Hence, those lower margins.
      
      someone better mod me up for all my work
- Re:Could this be.... (Score:2, Interesting)
  
  by zeno_2 ( 518291 ) writes:
  
  From just the story, it looks like to me that the linux kernel could send the cpu pages that were bigger/smaller then 4k, and it would have a problem with that. His fix would automatically detect the bug and resize the info that is sent to the cpu to 4k.
  
  The original pentium bug had to do with the floating point processor on the chip, not with the size of page that was sent to the chip..
  
  Of course I could be wrong about all this =)
- - F00F (Score:2)
    
    by srichman ( 231122 ) writes:
    
    As far as I remember, the the bug in the original pentium was a floating point flaw that led to wrong calclulations under certain circumstances.
    
    No, I think the analogous bug the parent was referring to was the F00F bug [ddj.com], which would hang Pentiums, regardless of OS, even for unprivileged users.
- Re:Nice write-up. (Score:2, Informative)
  
  by bob1000 ( 174146 ) writes:
  
  Add 'mem=nopentium' to your lilo/grub/whatever bootup or compile the kernel for i386 to avoid extended cpu operations. The fault is something in the page size extension and agp.. which is strange because I though agp would be more of a chipset issue than processor.
- Re:NO AMD BASHING (Score:2)
  
  by Afrosheen ( 42464 ) writes:
  
  From what I've seen with amd motherboards (granted, this isn't amd's fault), half of those damn via chipset discount boards should be bonfired. The worst agp implementation ever seems to rear it's ugly head only in linux.
  - Re:NO AMD BASHING (Score:3, Interesting)
    
    by Ryan Amos ( 16972 ) writes:
    
    VIA does make some complete crap, but they also make some nice chipsets. The KT266A is very nice, it's the fastest DDR implementation out there by far. But still, VIA chipsets are a good bit cheaper than the Intel equivalent, and while the Intel chipset may be more stable, the VIA one is almost always faster. And even Intel has issues with chipset stability, it's just that they ignore them and only quietly replace the faulty boards when they're returned under warranty. You know how it goes in the computer industry... Faster, cheaper, or more stable- pick any two.
    - Re:NO AMD BASHING (Score:3, Interesting)
      
      by Perdo ( 151843 ) writes:
      
      AMD doesn't keep tabs on VIA and VIA doesn't keep tabs on motherboard manufacturers.. The only decent AMD motherboards are the from manufacturers trying to compete in the enthusiast market where crap boards just don't sell. Combined with VIA actually being in competition with AMD in the budget processor market (The Cyrix) delaying a decent integrated chipset for the duron and VIA bullying motherboard manufacturers into not producing The SIS 735 chipset, VIA is not AMD's best friend.
      
      AMD chipsets:
      
      Nforce 220,420
      AMD-760MPX,760MP,760
      ALi MAGiK 1,MAGiK 2
      SIS 735,745,746,755
      VIA KT266A,KT133A,KM133,KLE133,KT333,K8HTB
      
      STABLE (100+Days,Linux) Chipsets:
      
      760,KT133A,735,760mp
      
      Good Motherboard Manufacturers:
      
      Asus,Abit,Iwill,ECS,Epox,Soyo
      
      Personal Best Uptime 135 days, Iwill KK266 (KT133A), Power supply failure
      - Re:NO AMD BASHING (Score:2, Insightful)
        
        by billcopc ( 196330 ) writes:
        
        I'll second that : Iwill boards are consistently better than average in terms of both performance and stability. Abit sucks ass though, they try to push things too far and forget that a super-overclocked machine that hangs every hour isn't worth shit.
        
        ECS are off to a very impressive start with the K7S5A board. Using the SIS 735 chipset, it is unsurpassed in reliability and offers very decent performance as well. Overclocking isn't its strong point, but at a mere 65$ price tag you can invest the money saved on a faster CPU.
        
        (no I'm not sponsored by ECS, I just hate my Abit KT7-Raid and am jealous of all my friends who have the ECS board)
- Re:NO AMD BASHING (Score:4, Insightful)
  
  by NanoGator ( 522640 ) writes: on Monday January 21, 2002 @04:44AM (#2875248) Homepage Journal
  
  AMD didn't turn interesting until the Athlon came out. The previous versions of its processors were decidedly inferior. This is *worse* than recalling for a bad, rarely used function call. I can't take a processor back 6 months after I bought it because it sucks, but I can get it replaced if it has a bona-fide bug.
  
  If this is a bug in the processor, AMD really should fix it and offer replacement processors to those who need it. If they don't, and they expect you to patch your OS instead, then that definitely shakes my faith in that company. When you're an artist dependent on OpenGL, you can't have problems like this.
  
  And finally...
  
  Why are you worried about running 32-bit code on a 64-bit processor? 64-bit processors are supposed to run 64-bit code. Intel's not marketing 64-bit processors to replace desktop computers (today), they're for servers and high-end graphics with custom code. They don't NEED to run 32-bit code. I hardly think that's a point against Intel, especially considering they don't make it a big secret that 32-bit code runs slower on it.
  
  Parent Share
  twitter facebook
  - Re:NO AMD BASHING (Score:5, Informative)
    
    by spauldo ( 118058 ) writes: on Monday January 21, 2002 @05:38AM (#2875399)
    
    Why are you worried about running 32-bit code on a 64-bit processor?
    Just as an aside, if you ever deal with ultrasparcs, you'll quickly find that the majority of the code used is 32 bit.
    The reason for it is simple; most applications will run slower at 64 bit than at 32 bit. The ultrasparc chips were designed to take this into account. Hell, due to a firmware bug, solaris on my ultra 1 installs as a 32 bit kernel by defualt - and runs no slower because of it (although it can't run 64 bit apps that way). After a firmware patch, it is easy to change to running the 64 bit kernel though.
    In all reality, why would most apps need 64 bit integers and whatnot? Most don't, and doing so is a waste of memory. If the processor is designed right, it can handle 32 bit code with no problems whatsoever.
    
    Parent Share
    twitter facebook
    - Re:NO AMD BASHING (Score:4, Interesting)
      
      by mikera ( 98932 ) writes: on Monday January 21, 2002 @08:05AM (#2875663) Homepage Journal
      
      I've lost count of the number of times I wanted 64-bit integers, in pretty general purpose apps.
      
      Not because I do big databases or suchlike, but they let you do loads of optimisations that wouldn't otherwise be possible. For example, you can pass around 8-byte structures in a single register, which is damn useful given the lack of available registers in the x86 architecture.
      
      Example: I've recently been coding a large hexagonal grid component. Each point in the grid is indexed by 2 32-bit (x,y) integers. With a 64-bit register, you could put a full co-ordinate into a single register.
      
      Why is this useful? Well, one of my requirements was to be able to manage large sets of co-ordinates (think reachable spaces for an AI). You want to be able to combine sets of co-ordinates, which basically requires merging two lists. In order to merge lists efficiently, you need to sort them. And with the 64-bit representation, you can do this with just one subtraction and one branch rather than a combination of two subtracts
      and two branches. This is a definite speedup if you are hand-coding, and possibly an even bigger one if your compiler doesn't inline all the 32-bit code properly.
      
      Other example: 32-bits are large enough for most integer applications (you couldn't enumerate all the people on the plant though....) but they tend to fall down when you multiply, e.g. 100,000 * 100,000 has already blown the 32-bit limit, and neither of those are particularly big numbers. Whenever you start doing a reasonable amount of multiplication, 64-bit becomes useful.
      
      Also, 64-bits is big enough to encode the positions of pieces on a chess board. You can use bitwise logic to analyse and store positions. GNU chess certainly does it this way. I expect a *cosiderable* speedup in the top chess-playing algorithms when 64-bit becomes widespread.
      
      I'm really keen to se 256-bit arrive to be honest, 2^(2^3) has more elegance than 2^(2*3) and it would allow you to store a set of bytes in one register. Would allow some very cool text-processing tricks.
      
      Course, it might never happen - I predict a move towards massively parallel 64-bit computers rather than stonking 256-bit ones as the next major evolution in processor power.
      
      Parent Share
      twitter facebook
      - You want vectors not huge integers (Score:2, Insightful)
        
        by yerricde ( 125198 ) writes:
        
        For example, you can pass around 8-byte structures in a single register, which is damn useful given the lack of available registers in the x86 architecture.
        
        And when you want to use or change one byte in the structure, what do you do? Shift it out and put it in another register. You can beat the "lack of registers" argument by switching to any current architecture but x86; you'll get at least 16, most likely 32, or even 64 registers.
        
        And with the 64-bit representation, you can do this with just one subtraction and one branch rather than a combination of two subtracts and two branches.
        
        One problem with your algorithm: one subtraction will "carry" over into the next because the processor assumes you're subtracting whole integers. What you want isn't really 64-bit integers but rather vector SIMD as found in MMX, SSE, and 3DNow!. In fact, AltiVec on the G4 processor is 128-bit.
      - Re:NO AMD BASHING (Score:2, Insightful)
        
        by jelle ( 14827 ) writes:
        
        "when you multiply, e.g. 100,000 * 100,000"
        
        When you multiply 2 32-bit numbers and really need the full precision of the 64-bit result, yes, then you need some 64-bit registers. However, that does not mean you need to have a multiply instruction that accepts 64-bit inputs. Also, often you don't need more than 32 bits of the result. In that case a barrel shifter in the chip right after the multiplier would already give you what you want without needing the large and slow 64x64 multiplier in the chip.
        
        On DSPs, you can often choose between 'integer mode' and 'fixed point mode'. In the former case they mean integer input values just like the CPU has, and in the latter case they mean values in the range [-1,1>, which places the decimal point 31 bits more towards the LSB. In 'fixed point mode', it's intuitively easier to stick with 32 bit precision if more precision is not needed.
        
        Additionally, DSPs have 'MAC' instructions: "accum out = accumin + (in1*in2)". Often, the number of bits in the 'accum' registers is larger than the number of bits in the 'in1' and 'in2' multiply inputs. A 16-bit DSP often (always?) has at least 32 bit wide 'accum' registers, often more than that, with up to 4 or 8 overflow bits in some cases. You need the overflow bits when you use the MAC instruction repeatedly (which is done often in typical DSP algorithms). With 4 overflow bits, you can use the MAC instruction 14=16 times and be guaranteed you'll never overflow 'accum'.
        
        Personally, I'd more prefer the CPUs to get more DSP features than a simple increase of 'bits'.
    - 64 bit Performance (Score:2, Informative)
      
      by digitalEric ( 527320 ) writes:
      
      Yes, UltraSPARC's run significantly slower in 64 bit mode. IIRC, this is because it takes more instructions to load 64 bit constants and access 64 bit pointers. This is not true of all 64 bit processors -- and it is not true of x86-64.
      
      The x86-64 architecture allows 64 bit programs to take advantage of the extra precision (and doubles the number of general-purpose registers, which x86 desperately needs), without forcing them to take the performance hit of using the full 64 bit addressing. It also adds a new, IP-relative addressing, which makes position-independant code (ie, shared libraries) much more efficient. There will be an increase in code size (and possibly a performance drop, but this depends on how AMD implements the 'movabs' instruction) when you start using more than 4GB of data. And, when you start using >4GB of code, things get yucky (requiring indirect jumps).
      
      But, the point is, x86-64 will run all your 32 bit x86 code at full speed, and if you're able to re-compile your programs for 64 bit mode, you should get a performance boost, if only from getting 9 more registers (8 + no longer need to keep a pointer to the GOT).
  - What bloody bug? (Score:5, Funny)
    
    by DABANSHEE ( 154661 ) writes: on Monday January 21, 2002 @08:01AM (#2875651)
    
    None of the Athlons or Durons I've built have had any problems with Tux Racer (Mostly on Man8.1 default install).
    
    My nephew spends hours Sliding that little penguin arround with that bloody elevator music going, & not once has there been a freeze or lockup, much to my dissapointment.
    
    Parent Share
    twitter facebook
  - ha ha, where's the problem? (Score:2)
    
    by Erris ( 531066 ) writes:
    
    When you're an artist dependent on OpenGL, you can't have problems like this.
    Of course if you were in that situation, you must not have noticed.
    Bug or no bug, my machines have been running just fine. I bought them based on reviews that showed them running circles aroung Intel and they did. At the speeds the newer machines run, I'd hardly notice if they were hanging.
  - Re:NO AMD BASHING (Score:3, Insightful)
    
    by mz001b ( 122709 ) writes:
    
    As someone pointed out in elsewhere, this would make the processors too expensive, if the vendor had to ship replacement processors each time a bug was found. Lots of bugs exist in processors, and typically they are fixed with each new stepping. Look at /proc/cpuinfo and see how many bugs it checks for (fdiv_bug, hlt_bug, f00f_bug, coma_bug on my system). This bug will probably be just another line. There is a simple workaround for it too, so it is not that bad. The real problem (as may people state) is that AMD did not inform the kernel developers about this problem long ago, so a fix could already be implemented.
- Re:NO AMD BASHING (Score:3, Interesting)
  
  by Anonymous Coward writes:
  
  Not like they are recalling processors like Intel
  -----
  
  Oh great, so they make defective processors, but don't worry because they won't recall them! How in the hell does that make them better than Intel?
  
  Think about it -- If you own an affected part a recall is GOOD!
- Re:More info? (Score:2)
  
  by larien ( 5608 ) writes:
  
  <AOL> I'd also like to know if I'm affected or not; I've been getting some hangs on starting X (the system locks up with the NVidia logo on screen) and I'd like to know if this is related...
  - Re:More info? (Score:2, Informative)
    
    by Sadfsdaf ( 106536 ) writes:
    
    Disable Fast AGP write (AGP Turbo?) in your BIOS.
    
    Read the manual. http://205.158.109.140/XFree86_40/1.0-2313/README. txt
    - Re:More info? (Score:2)
      
      by larien ( 5608 ) writes:
      
      That seems to be Ali specific; my mboard is an Asus (and so it the GF3, FWIW). However, thanks for the pointer, I'll give it a try.
  - Re:More info? (Score:2)
    
    by Enigma2175 ( 179646 ) writes:
    
    Dammit, close your AOL tag, you just AOLed the rest of the page for me! I will close it now for future reading.
- Re:Now aren't you glad you use Free/Net/Open BSD(n (Score:2)
  
  by Yarn ( 75 ) writes:
  
  Does anyone actually *know* if this is worked around in the *bsds?
  
  Or do they use the 4k method by default anyway?
- Re:Bug Problem with SETI (Score:2)
  
  by larien ( 5608 ) writes:
  
  Could well be possible; AFAIK, SETI throws around a fair bit of data, so it might do some paging. If it 'invariably' killed your machine, it should be easy to test using the boot options.
- Re:Why don't I see this? (Score:2)
  
  by hearingaid ( 216439 ) writes:
  
  The announcement did indicate most Athlons were affected by the bug. Perhaps you're one of the lucky few who isn't. If you have no bug, do not worry: do not attempt to fix your computer as it is not broken. :)
- Re:Can registered and ECC RAM help? (Score:2, Informative)
  
  by Tazzy531 ( 456079 ) writes:
  
  It's not a matter of the type or quality of the memory but how the chip address the memory. There is a flaw in the chip itself. A layman's analogy might be: if a telephone book only list the first 5 numbers of a phone number. What you are suggesting is to replace all the telephones in the world. Even if you do, the phone book still won't work because the phone numbers are incorrect. What has to be fixed is the phone book [or the way of finding phone numbers]. Go here [ccu.edu.tw] for more technical information.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

I noticed too (Score:3, Interesting)

And we were blaming the NVIDIA drivers... (Score:2, Informative)

That presumes that many happen to HAVE other cards (Score:2)

Annoyed at something else. (Score:4, Insightful)

Is this the same as the Win2k bug? (Score:4, Interesting)

Re:Is this the same as the Win2k bug? (Score:5, Funny)

Hmm, Win2k needs patched, Linux needs boot option (Score:2)

Who's Responsibility? (Score:3, Funny)

Re:Is this the same as the Win2k bug? (Score:4, Redundant)

Re:Is this the same as the Win2k bug? (Score:2, Flamebait)

Re:Is this the same as the Win2k bug? (Score:5, Informative)

Re:Is this the same as the Win2k bug? (Score:3, Interesting)

For once Microsoft manged to fix it first (Score:2, Informative)

Re:For once Microsoft manged to fix it first (Score:4, Redundant)

Well that is it! (Score:2, Funny)

Re:For once Microsoft manged to fix it first (Score:2, Funny)

Re:For once Microsoft manged to fix it first (Score:2, Troll)

AthlonXP not affected (Score:2, Funny)

Re:AthlonXP not affected (Score:2)

Re:For once Microsoft manged to fix it first (Score:2, Informative)

old news again (Score:2, Interesting)

Re:old news again (Score:2, Offtopic)

Mirror/cache from Google! (Score:3, Redundant)

Another mirror/summary here (Score:3, Informative)

The quick answer: (Score:5, Informative)

Simple Workaround (Score:3, Redundant)

Performance hit? (Score:4, Interesting)

Re:Performance hit? (Score:3, Informative)

what's the point of 4MB pages? (Score:3, Insightful)

Re:Performance hit? (Score:4, Interesting)

Re:Performance hit? (Score:2, Troll)

Re:Performance hit? (Score:2)

Re:Performance hit? (Score:2)

Re:Performance hit? (Score:5, Interesting)

Re:Performance hit? (Score:2)

Is this present in Athlon optimized kernels? (Score:2, Interesting)

Re:Is this present in Athlon optimized kernels? (Score:2, Informative)

Nvidia + AGP + Irongate + Athlon (Score:4, Interesting)

Should AMD do the right thing? (Score:2, Insightful)

Re:Should AMD do the right thing? (Score:3, Informative)

Re:Should AMD do the right thing? (Score:4, Informative)

HCF is a reference to an old IBM joke. (Score:3, Interesting)

Re:Should AMD do the right thing? (Score:5, Interesting)

Athlon bug, and NVIDIA drivers (Score:3, Interesting)

Re:Athlon bug, and NVIDIA drivers (Score:2)

How-To: lilo workaround (Score:4, Redundant)

Does this happen if kernel compiled for K7? (Score:4, Interesting)

Optimised kernels still buggy (Score:2, Informative)

Re:Does this happen if kernel compiled for K7? (Score:2)

The equivalent Win2k bug fix (Score:3, Informative)

Buggy Features (Score:5, Funny)

The guys who found the bug... (Score:2, Funny)

Using Test Suites to Validate the Linux Kernel (Score:5, Informative)

Quake 3 benchmarks (Score:5, Informative)

I'm not sure what to think (Score:2, Interesting)

Re:I'm not sure what to think (Score:2)

I had a stroll through AMD erratas (Score:3, Interesting)

Alternate, faster? workaround (Score:5, Interesting)

Other Hackers did it better . . . (Score:5, Informative)

Re:Other Hackers did it better . . . (Score:2, Informative)

Re:Other Hackers did it better . . . (Score:2, Informative)

Not a documented errata (Score:2)

grub workaround (Score:3, Redundant)

bad form AMD, realy bad form (Score:3, Troll)

been fixed? (Score:2)

Wow... (Score:2)

Please explain (Score:2)

AMD Rev A5/CPUID 662 (Score:2, Informative)

SO that's why! (Score:2)

This is flaw in how Linux is (not) managed (Score:2, Insightful)

I just want to know (Score:2)

curious (Score:2)

Does this problem occur in the 2.2 kernel series? (Score:2)

Don't think so (Score:2, Informative)

in response to mr troll (Score:2, Flamebait)

Re:Could this be.... (Score:2, Interesting)

F00F (Score:2)

Re:Nice write-up. (Score:2, Informative)

Re:NO AMD BASHING (Score:2)

Re:NO AMD BASHING (Score:3, Interesting)