Tracking Down The AMD "Processor Bug" 237
tercero writes: "over at the Gentoo Linux website there is an update on the AMD processor bug mentioned here. The sum up is that AMD claims it's not a bug with the Athlon processor, but with the motherboard. More detailed information can be found on this LKML post."
An Anonymous Coward points to a similar explanation at Linux Weekly News.
Update: 01/25 01:25 GMT by T : Daniel Robbins from Gentoo clarifies: "AMD is not
calling this a 'motherboard' issue, it is an interaction between a
feature of the Athlon called 'speculative writes' and the design of the
GART, which is not cache-coherent. It's a 'Athlon/cache coherency/GART'
problem, not a 'motherboard' problem."
Kernel parameter vs LILO config file (Score:5, Informative)
mem=nopentium
and turn off 4MB pages (which may or may not prevent the problem from manifesting -- the situation is unclear at this time). You can do this at the boot prompt like this
LILO boot: linux mem=nopentium
or by placing the configuration directive
append="mem=nopentium"
in your
See the manual page for lilo.conf for the details.
More information (Score:5, Informative)
More information is now available at http://www.gentoo.org [gentoo.org], including an analysis of AMD's response. AMD's official response was posted to LKML, and is available at http://www.geocrawler.com/lists/3/Linux/35/175/76
There is apparently some kind of bad interaction between the AGP GART ("Graphics Address Remapping Table", I think?), speculative memory operations performed by the Athlon processor, the memory mappings used by the kernel, and cache coherency. The details are beyond me, but the practical upshot appears to be that the wrong data ends up being written back to main memory at some point.
I recommend reading the above LKML thread if you suspect you are affected by this issue. Information is still being uncovered, and it is not immediately clear how this occurs, what causes it, who is affected by it, and how to work around it.
In particular, there is some uncertainty as to whether the "mem=nopentium" option actually prevents the problem, or merely makes it less likely to occur.
All of the above. (Score:5, Informative)
According to young bald children everywhere, "There is no bug".
In related news, the motherboard manufacturers are quoted as saying, "It's not a bug with the motherboard, but with the Athlon processor."
Funny, I didn't think I was bald...
It's an Athlon bug if you think doing speculative writes is a bug.
It's a motherboard chipset bug if you think that the AGP controller should play nicely with cache-coherence protocols (right now it doesn't, presumably to gain a speed boost).
It's an OS bug if you think that the OS should be bright enough not to make AGP-touched memory cacheable (it wasn't intended to be).
I'm voting for option 3), myself.
Well, I'll be (Score:1, Informative)
Re:Easy - Buy Intel. The cost of using 2nd party.. (Score:2, Informative)
If you paid attention to benchmarks you'd see that in almost every case AMD has a higher cost effectiveness than Intel. If you have some specific examples of why AMD is not a good choice (as opposed to vague, illogical ramblings) then why don't you share them? Prove that your mumblings are, "not made up of bugus stuff"
Re:This is embarassing (Score:3, Informative)
If you read the technical writeup on LKML, you'll see that it's not a hardware issue, but a software bug. Which is why AMD announced the bug as a Windows 2000 issue--it is one. Linux also happens to have the same bug (it's a subtle issue and an easy mistake to make, IMO), but how was AMD supposed to know that Linux was doing the same bad thing--mapping the AGP GART area cacheable, when the GART is non-cacheable?
Re:This is embarassing (Score:2, Informative)
Oh, that's easy. The engineer who discovered the problem should have realized that it's not necessarily a Windows-specific issue, but a problem that any OS could have. He should have then tried to contact all the OS vendors, not just Microsoft.
Considering how Linux is used by a higher percentage of AMD customers than Intel customers, AMD should have paid more attention to an important segment of its customer base.
OS Bug (Score:3, Informative)
In truth, we should probably say it is a combination of a problem with the OS and a problem with the processor. After all, Intel processors don't have the same problem, simply because they work differently. So while it may not technically be the CPU's fault, the CPU does play a part.
It's Linux, NOT the motherboard! (Score:2, Informative)
"Our conclusion is that the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory."
Re:this is not a motherboard bug either... (Score:3, Informative)
1) Get a video card with 270+MB of memory. (Yeah, right.)
2) Snatch from main memory the portions of the texture you need. (This gets slow AND ugly if you use more than ~16MB in a single frame.)
3) Use the GART, take (less of) a performance hit, and just keep the textures in system memory.
This was the original purpose of the GART, and is still important.
Re:You are assuming... (Score:5, Informative)
There are Pentium systems with an AGP port? If you mean the Pentium II and up, I don't see why the GART would be cacheable there either; I don't know if the P4 chipsets have changed things, but with the PII and PIII, here's what Intel had to say about the subject:
(Emphasis added). As for why the bug doesn't happen on Intel CPUs, it sounds like the Athlon has more aggressive speculative writes and can change memory that wasn't explicitly written to, dirtying the cache. But in any case, even on Intel CPUs, the AGP area is supposed to be mapped non-cacheable.Why does disabling large pages fix the problem?
Don't know about that one; I haven't read the various tech docs for the Athlon. Perhaps the cache works slightly differently with 4MB pages vs 4KB pages?
Re:this is something.. (Score:3, Informative)
FYI I don't own a mac, but I will purchase one next time I want a computer.