Slashdot Log In
Patch To Allow Linux To Use Defective DIMMs
Posted by
timothy
on Wed Oct 25, 2000 11:44 AM
from the just-like-processors-are-treated dept.
from the just-like-processors-are-treated dept.
BtG writes: "BadRAM is a patch to Linux 2.2 which allows it to make use of faulty memory by marking the bad pages as unallocatable at boot time. If there were a source of cheap faulty DIMMs this would make building Linux boxes with buckets of memory significantly cheaper; it also demonstrates another advantage of having the source code to one's operating system." The BadRAM page has a great explanation of the project's motivation and status. Now where can I pick up some faulty-but-fixable 512MB RAM sticks?
This discussion has been archived.
No new comments can be posted.
Patch To Allow Linux To Use Defective DIMMs
|
Log In/Create an Account
| Top
| 247 comments
(Spill at 50!) | Index Only
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Bad Memory doesn't go to waste (Score:3)
There isn't this huge supply of bad memory out there (Radio Shack jokes aside) because memory manufacturers are pretty clever. Bad memory is put into things like:
Audio storage devices, like answering machines and mp3 players, where a bit or two of failure will just end up as a teeny bit more noise.
Cheap digital cameras (once again, a bad pixel here or there....)
Toys. They actually call bad memory "toy memory" sometimes.
SIMMS. You take (for example) 4 bad chips and 1 good chip and get the equivalent of 4 good chips (by replacing bad io's on the bad chips with io's on the good chip). There are jillions of ways to do this, and companies have pretty much done them all.
Sell them at CompUSA to people who don't know any better. (Sorry, couldn't resist)
If I were you, I'd download memtest86 [sgi.com] right now.
Re:Signal 11 no more? (Score:5)
"Of course Signal 11 is no more.. He left after a big blowout with Rob..."
--
This message brought to you by Colin Davis
Just how useful is this, really? (Score:5)
You'll probably get better results simply by cleaning off the contacts with a pencil eraser (remembering to brush away all the eraser dust first) and firmly re-inserting them into the socket.
Does Slashdot readership know nothing of hardware? (Score:5)
Alright, so we've accepted that some dies are necessarily going to be damaged. Why not make the hardware such that it can resist imperfections? Well, actually we do. RAM being as simple and homogenous as it is, lends itself well to this approach. Here's the idea: you add extra "blocks" of memory to a decode line. Then, if one of the "regular" blocks is destroyed by a process imperfection, the post-fab die can be modified with laser to reroute data to the extra backup block. So you invest some die room in backup structures, so that a die with only a few errors can be "corrected" and will still function as intended. This is basically like keeping a spare tire. If you get one blowout, you're still in business, but two and you are in trouble. Of course, you can package as many extras as necessary, but it may not make economic sense. Here you calculate the appropriate trade off between die size and yield to make the decision.
Anyway, long story short: your DRAM is already "bad". Quite a few RAM chips contain process errors that are rerouted around in hardware so that you, the consumer, need never know. To you, the process is transparent. All you should care about is that you get your *functional* RAM cheaper, because the manufacturer would have had to scrap that die otherwise.
This post discusses software "rerouting" around blocks that had more errors than could be corrected in hardware, but somehow still made it out the door. What's wrong with that?
Will semiconductor manufacturers suddenly think "Gee...let's not worry about yield anymore?" You'd better bet they won't. And even if they did, if the software rerouting is so clean as to not be noticeable (which is the only way it would fly), what do you care? You'd get your RAM cheaper.
--Lenny
Finally! (Score:3)
----
Oh, sure, Linux users are this desperate (Score:5)
"Hello, Kingston, I'm looking for any old cruddy defective RAM, got any? Uh.. No.. I won't be reselling it to Linux users, I swear that I am with a major US ISP and we want to put it into our servers! Call Rambus, you say? Hello? Hello?"
--
Similar solution exists in the 2.4 kernel already! (Score:4)
Anything similar? (Score:5)
My bad RAM story (Score:4)
We had just installed an Exchange server we were rolling out the Exchange client to all the desktop PCs. Unfortunately, no one had thought to ask if they could take it--which many of them couldn't. So we were feverishly digging up all the RAM we could find and sticking it into machines as fas as we could. I happened to find a 32MB stick (glory be!) in an unused PC. I said to my boss: "Hey, I found a big one!" He turns around and asked "Is it any good?" while simultaneously reaching for it, and ZAP audibly discharges static electricity right into the thing. We look at each other for a moment and then I say "Not anymore."
I was wrong, though--it was fine.
--
An abstained vote is a vote for Bush and Gore.
No, this *is* good for production use! (Score:5)
If we ever want to see linux used in mission critical systems like air traffic control, embedded medical devices, or military applications, then projects like this are the key. Fault tolerance now exists for memory (this project), storage (RAID), and communication (redundant NICs). The next target should be the CPU.
How about projects to detect the types of errors a failing (typically, overheated) cpu produces, and adjust the scheduler accordingly to insert idle time and cool down the cpu? Or to use one cpu to monitor another in multiprocessor systems, and avoid using a processor that starts producing faulty results?
Imperfect knowledge, but ... them's the breaks. (Score:3)
But how many people saw it on kt? For purely selfish reasons, I'd like to see a lot more people know about this project, because I find it very interesting and useful-looking. Plus, I think it's just a neat hack in general, and I'd like to point it out.
If it's too old for you, then
YMMV, whaddya do?
OK.
timothy
Real information... (Score:5)
Signal 11 no more? (Score:4)
It made it notorious for working with dodge memory, failing to boot half of the time. I've seen people blame Linux for bad hardward because it would work with Windows.
It's nice that Linux now could just go
*ARGH YOU HAVE CRAP MEMORY*
shrug it's shoulders and chug along anyway.
Re:Is this good for Linux's rep? (Score:3)
Better hurry... (Score:5)
handfull of busted 256m DIMMS: $10.71 with tax
6 reboots, a little math, and a partial kernel compile: 21min
The look on my roommate's face when I typed "top": priceless!
Swiss Cheese (Score:3)
Linux forced its way into our IT Department when it could restore a trashed system into something useful. Here at The Salvation Army, we endevor to be good stewards of what we are given. We have an IBM PC Server 350 (now named "Methusela") that crashed one day for no apparent reason. It refused to run Windows anymore... not even Win98 or Win95!
But it ran Linux flawlessly. Well, actually it did point out one flaw on its own: The internal Ethernet controller was getting an unusually high number of bad packets. It would receive DHCP assignments, even do some web work in Linux... but it was enough to shut Windows down completely. Even after installing a working NIC, Windows could not run due to the faulty internal NIC, but Linux ran fine!
Likewise, we found an instant way to crash every WinNT system in the building. Someone was re-arranging the hubs and switches, and accidentally created a packet loop by plugging a switch back to itself... in three seconds every WinNT system on the network went straight to the Blue Screen of Death.
It one thing to handle the rules well, but quite another to deal with the exceptions!
anti-linux? (Score:4)
I know you're just trolling, and I shouldn't respond, but for students, and anybody who has access to memory modules that are experiencing known, predictable faults, this would be great. Not everybody has some fancy $30,000/year job, y'know.
--
"Don't trolls get tired?"
Now there's a point to the BIOS memory test? (Score:3)
- Does it run every possible combination of CPU instructions on boot up? No!
- Does it check every single block on the hard drive? No!
- Does it check all the blocks of floppies, CDs, DVDs, etc to make sure they work? No!
- If the memory test is essential to the functioning of the system, why do they let you skip it?
Obviously, the smart thing to do is to _wait_ for the memory to fail rather than test the whole lot for a minute or two. After doing a full test once, the first time you boot, you can leave a very low priority memory tester running, or leave the full test to some quiet period with a cron job - a decent memory test [sgi.com] of course, not that half-witted test that BIOSes do.Why bother? (Score:4)
Modern DRAM doesn't have much trouble with bad cells, and the yields are quite good. So there isn't a big supply of DRAM with bad cells that fail solidly. Most DRAM problems today are at the edges: at the buffers, the connectors, or clock synchronization - the things that can be messed up during installation.
Personally, I get ECC RAM even on desktops, just so I know it's working. It eliminates arguments with tech support when the hardware really is broken.
Err... (Score:4)
- A.P.
--
* CmdrTaco is an idiot.
Best Buy (Score:4)