Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AMD

Tracking Down The AMD "Processor Bug" 237

tercero writes: "over at the Gentoo Linux website there is an update on the AMD processor bug mentioned here. The sum up is that AMD claims it's not a bug with the Athlon processor, but with the motherboard. More detailed information can be found on this LKML post." An Anonymous Coward points to a similar explanation at Linux Weekly News. Update: 01/25 01:25 GMT by T : Daniel Robbins from Gentoo clarifies: "AMD is not calling this a 'motherboard' issue, it is an interaction between a feature of the Athlon called 'speculative writes' and the design of the GART, which is not cache-coherent. It's a 'Athlon/cache coherency/GART' problem, not a 'motherboard' problem."
This discussion has been archived. No new comments can be posted.

Tracking Down The AMD "Processor Bug"

Comments Filter:
  • by DragonHawk ( 21256 ) on Thursday January 24, 2002 @03:33PM (#2896226) Homepage Journal
    The kernel will look for the parameter

    mem=nopentium

    and turn off 4MB pages (which may or may not prevent the problem from manifesting -- the situation is unclear at this time). You can do this at the boot prompt like this

    LILO boot: linux mem=nopentium

    or by placing the configuration directive

    append="mem=nopentium"

    in your /etc/lilo.conf configuration file.

    See the manual page for lilo.conf for the details.
  • More information (Score:5, Informative)

    by DragonHawk ( 21256 ) on Thursday January 24, 2002 @03:39PM (#2896264) Homepage Journal
    Yesterday, information became widely available that described possible stability issues (system crashes, hangs, etc.) when using an AGP video card under Linux in conjunction with an AMD Athlon processor. It was generally called a "bug" in the Athlon CPU.

    More information is now available at http://www.gentoo.org [gentoo.org], including an analysis of AMD's response. AMD's official response was posted to LKML, and is available at http://www.geocrawler.com/lists/3/Linux/35/175/762 6960/ [geocrawler.com].

    There is apparently some kind of bad interaction between the AGP GART ("Graphics Address Remapping Table", I think?), speculative memory operations performed by the Athlon processor, the memory mappings used by the kernel, and cache coherency. The details are beyond me, but the practical upshot appears to be that the wrong data ends up being written back to main memory at some point.

    I recommend reading the above LKML thread if you suspect you are affected by this issue. Information is still being uncovered, and it is not immediately clear how this occurs, what causes it, who is affected by it, and how to work around it.

    In particular, there is some uncertainty as to whether the "mem=nopentium" option actually prevents the problem, or merely makes it less likely to occur.
  • All of the above. (Score:5, Informative)

    by Christopher Thomas ( 11717 ) on Thursday January 24, 2002 @03:42PM (#2896286)
    AMD claims it's not a bug with the Athlon processor, but with the motherboard

    According to young bald children everywhere, "There is no bug".

    In related news, the motherboard manufacturers are quoted as saying, "It's not a bug with the motherboard, but with the Athlon processor."


    Funny, I didn't think I was bald...

    It's an Athlon bug if you think doing speculative writes is a bug.

    It's a motherboard chipset bug if you think that the AGP controller should play nicely with cache-coherence protocols (right now it doesn't, presumably to gain a speed boost).

    It's an OS bug if you think that the OS should be bright enough not to make AGP-touched memory cacheable (it wasn't intended to be).

    I'm voting for option 3), myself.
  • Well, I'll be (Score:1, Informative)

    by Anonymous Coward on Thursday January 24, 2002 @03:46PM (#2896307)
    Well, I'll be darned. Vendors pointing the finger at each other. Who'd have thought?
  • by pivo ( 11957 ) on Thursday January 24, 2002 @04:09PM (#2896460)
    Your argument is incoherant. The idea that lower cost equals lower performance can't be backed up by the presence of a bug, and it ignores real (as in "this is reality") market factors. There have been bugs in all sorts of hardware and software, even the highest performing hardware in the world. There is no correlation.

    If you paid attention to benchmarks you'd see that in almost every case AMD has a higher cost effectiveness than Intel. If you have some specific examples of why AMD is not a good choice (as opposed to vague, illogical ramblings) then why don't you share them? Prove that your mumblings are, "not made up of bugus stuff"

  • by Dahan ( 130247 ) <khym@azeotrope.org> on Thursday January 24, 2002 @04:12PM (#2896469)
    Actually, it isn't embarassing at all. It wasn't the "Linux Community"'s fault. This is the fault of AMD who anounced/classified the bug as a Windows 2000 issue instead of a hardware issue.

    If you read the technical writeup on LKML, you'll see that it's not a hardware issue, but a software bug. Which is why AMD announced the bug as a Windows 2000 issue--it is one. Linux also happens to have the same bug (it's a subtle issue and an easy mistake to make, IMO), but how was AMD supposed to know that Linux was doing the same bad thing--mapping the AGP GART area cacheable, when the GART is non-cacheable?

  • by LordNimon ( 85072 ) on Thursday January 24, 2002 @04:20PM (#2896515)
    but how was AMD supposed to know that Linux was doing the same bad thing

    Oh, that's easy. The engineer who discovered the problem should have realized that it's not necessarily a Windows-specific issue, but a problem that any OS could have. He should have then tried to contact all the OS vendors, not just Microsoft.

    Considering how Linux is used by a higher percentage of AMD customers than Intel customers, AMD should have paid more attention to an important segment of its customer base.

  • OS Bug (Score:3, Informative)

    by kenneth_martens ( 320269 ) on Thursday January 24, 2002 @04:21PM (#2896520)
    According to the article [lwn.net], it is not a problem with the motherboard at all. The problem is "the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory." That means it's a problem with the OS, not with the motherboard or processor.

    In truth, we should probably say it is a combination of a problem with the OS and a problem with the processor. After all, Intel processors don't have the same problem, simply because they work differently. So while it may not technically be the CPU's fault, the CPU does play a part.
  • by Anonymous Coward on Thursday January 24, 2002 @04:41PM (#2896636)
    Get the story straight:

    "Our conclusion is that the operating system is creating coherency problems within the system by creating cacheable translation to AGP GART-mapped physical memory."
  • by addaon ( 41825 ) <addaon+slashdot.gmail@com> on Thursday January 24, 2002 @05:28PM (#2896899)
    While the use of the GART you mention (video chipsets with no onboard memory) really does suck, performance-wise, the GART itself is not useless. Most games today limit themselves to 16MB or so of textures, so that they run properly, without swapping to main memory, with a 32MB video card. However, if you want a game with 256MB of textures, say, you have three options.

    1) Get a video card with 270+MB of memory. (Yeah, right.)

    2) Snatch from main memory the portions of the texture you need. (This gets slow AND ugly if you use more than ~16MB in a single frame.)

    3) Use the GART, take (less of) a performance hit, and just keep the textures in system memory.

    This was the original purpose of the GART, and is still important.
  • by Dahan ( 130247 ) <khym@azeotrope.org> on Thursday January 24, 2002 @05:29PM (#2896904)
    Apparently the GART is cacheable on pentium systems?

    There are Pentium systems with an AGP port? If you mean the Pentium II and up, I don't see why the GART would be cacheable there either; I don't know if the P4 chipsets have changed things, but with the PII and PIII, here's what Intel had to say about the subject:

    For current hardware implementations,
    the OS will make AGP memory (like other video memory) non-cacheable, so that there is no coherency problem between the CPU caches and the data that the graphics controller uses. Otherwise, graphics controller accesses to AGP memory would require "snooping" the CPU caches, which would cause delays in execution in some cases.

    -- AGP and Graphics Optimization Techniques [google.com]

    (Emphasis added). As for why the bug doesn't happen on Intel CPUs, it sounds like the Athlon has more aggressive speculative writes and can change memory that wasn't explicitly written to, dirtying the cache. But in any case, even on Intel CPUs, the AGP area is supposed to be mapped non-cacheable.

    Why does disabling large pages fix the problem?

    Don't know about that one; I haven't read the various tech docs for the Athlon. Perhaps the cache works slightly differently with 4MB pages vs 4KB pages?

  • by geekoid ( 135745 ) <dadinportland&yahoo,com> on Thursday January 24, 2002 @05:59PM (#2897187) Homepage Journal
    BOy where have I heard that before... oh yeah every 2 years since there have been macs..sheesh.
    FYI I don't own a mac, but I will purchase one next time I want a computer.

All seems condemned in the long run to approximate a state akin to Gaussian noise. -- James Martin

Working...