Faulty Chips Might Just be 'Good Enough' 342
Ritalin16 writes "According to a Wired.com article, 'Consumer electronics could be a whole lot cheaper if chip manufacturers stopped throwing out all their defective chips, according to a researcher at the University of Southern California. Chip manufacturing is currently very wasteful. Between 20 percent and 50 percent of a manufacturer's total production is tossed or recycled because the chips contain minor imperfections. Defects in just one of the millions of tiny gates on a processor can doom the entire chip. But USC professor Melvin Breuer believes the imperfections are often too small for humans to even notice, especially when the chips are to be used in video and sound applications.' But just in case you do end up with a dead chip, here is a guide to making a CPU keychain."
No Thank You. (Score:5, Insightful)
I'm not so sure... (Score:5, Insightful)
It may seem that there's a basic linear line between over-the-top quality control and cost and more economical quality control and cost, however one has to think that if it turns out that these chips are more likely to have defects in them and in fact do in the future, how long will costs remain low? The chip will still be useless and will have to be replaced, added to that the cost of making the returns from the customer/store and then the possible customer dissatisfaction with the company's quality which could result in a lost sale in the future. Will it actually be cheaper in the long term?
If small faults are tolerable (Score:5, Insightful)
The only reason for moving to digital switches was accuracy - the cost of the first digital bitflipper processors was far more expensive than valve technology was in 1950s and 1960s. And that really was the only reason for changing to digital processors.
Nothing new (Score:5, Insightful)
In short: computers suck...
Already being done (somewhat) (Score:5, Insightful)
Then the Overclockers come in and ramp the speed back up, and claim 'the faster chips are a ripoff' and complain that 'Windows is always crashing.'
they don't waste finished chips (Score:5, Insightful)
Therefor throwing away a $2 chip during production doesn't cost $2. It's only worth $2 by the time the customer pays for it.
Sure you could sell the defects at some discount, but it's only worth the trouble for some high volume part like RAM where defects are easily useable, and definitely NOT a part where the impact of some particular defect in the end user's application could be really hard to characterize (like a CPU).
Re:Already being done (somewhat) (Score:2, Insightful)
Perhaps it's because Windows also tends to crash on normal machines too? I mean, you never hear *nix overclockers complain that their OS crashes all the time do you?
If'n it were possible... (Score:5, Insightful)
Please remember that this is the same industry that came up with the 80486SX when they were having lousy yields on 80486DX chips. If these processors had any utility, trust me, they'd find a way to make money off 'em.
Re:I'm not so sure... (Score:5, Insightful)
We've already got enough bad components floating around. We surely don't need any more.
This is completely bogus. (Score:5, Insightful)
Basically, the problem is this. With mechanical and analogue devices, most of the time you know that if you change the inputs a small amount, the outputs will change a small amount.
But digital devices are chaotic. Change one bit in the input, and the output is likely to be radically different. One bit in the wrong place on a Windows system can make the difference between Counterstrike and a BSOD.
You can use substandard devices for some applications; dodgy RAM, for example, can be used to store audio on, and it would work just as well for video framebuffers. But you could never put anything programmatic on it; that has to be perfect.
(IIRC, they do recycle faulty wafers. One of the ways is to scrape the doped layer off and turn them into solar cells. I don't know if they can use them again for ICs, though.)
Re:Stories (Score:5, Insightful)
Manufacturers are VERY aware they can charge a larger premium for better parts
This is also a problem in medicine (Score:5, Insightful)
And the reason for this is, as you get closer to perfection, it takes more and more of an economic cost, in terms of money or resources or time or effort. For a computer or a medicine to go from 90 percent to 99 percent utility means a ten fold increase in price.
Thats why the constant quest to have "perfect" electronics and medicine is driving up the prices of these things to the point where normal people can't afford them. If we could accept that we didn't always need new, perfect, shiny medicines and electronics, it would put them in a sane price range.
Good Use (Score:2, Insightful)
Re:Already being done (somewhat) (Score:5, Insightful)
For most chips, except ones like CPUs where you can charge a premium you don't speed bin (it costs lots of money), you pick a speed you think it should go at and toss the rest. Shipping chips that almost work is bad business - think about it, I make a $5 chip it gets put in a $100 product, if 10% of my chips don't work my customer loses $100 for every $50 he pays me, I have to get my failure rate down so it's in the noise as far as the customers are concerned, otherwise they'll go to the competition.
I think that the number of applications the original article's talking about where chip errors are tollerable are pretty small, suppose my CPU has a bit error in the LSB of the integer adder, the IRS may not care if my taxes are off by 1c, but the MSB is a different matter ("sir you appear to owe us 40M$"). On the other hand an LSB error is a big deal if the value you are dealing with is a memory pointer and breaks a program just as badly as if it is the MSB.
Finally a word about "metastability" - all chips with more than one clock (video cards are great examples" have to move signals between clock domains - this means that signals can be sampled wrongly (well designed logic should handle this) or in rare cases suffer metastability where the result causes unstable logic values to be latched into flops (usually these look like a value that swings wildly between 0 and 1 at a freq much higher the normal clock, a flop in a metastable state can 'pollute' other flops downstream from it turning a chip into a gibbering wreck. Now well designed logic doesn't do this very often, the flops chosen for crossing clock domains are often special anti-metastability flops used not for their speed or their size but their robustness - but the physics of the situation means that it's simply not possible to avoid - just possible to make it not happen very often. What you do need to do is figure out how often something will fail and pick a MTBF that is appropriate for your device ... I once found myself discussing this issue around a video chip we were designing and basically what it came down to was comparing the theoretical worst case failure rate (chip people tend to be very conservative, keeps us on the right side of Murphy) of our chip with Windows - our chip might fail once a year (and even then there was a pretty good chance you wouldn't notice it) while back then windows blue screen every day - would anyone notice? nope
Re:No Thank You. (Score:5, Insightful)
If a CPU producer passed a general purpose chip, and it ended up that the defect was responsible for a tort, then they might be liable.
Their ought to be three bins: MIL-SPEC, No Defects, and Defect Detected But Passed Regression Suite. Anyone purchasing from the third bin has to accept liability for unforseen malfunctions.
Re:Nothing new (Score:5, Insightful)
I bought a friend an LCD and it had a single pixel fault - bright green always on right in the middle. Made the display unusable. Manufacturer pointed to their returns policy of 5 deal pixels and would not accept it back.
If the pixel had been at the corner - no problem.
Problem with looking for CPU failures is that there are a very large number of ways chips could fail - and for each of these you have to try and ascertain what the impact of the failure is.
It would be very hard to ascertain the myriad of impacts a single gate failure could have, let alone the combinations of multiple failures.
I would hate the manufacturers to create a second tier of CPUs at a lower price point - the ways these chips could cause my s/w to fail would be vexing.
Plus, it's already being done. (Score:5, Insightful)
Depends...
Graphics chips these days have multiple pipelines, and are shipped in variants with different numbers of pipelines. If you can build a board that lets you use (say) any two pipelines out of a 4-pipeline chip, then you can use more of the defective chips. Similarly, if you're making MP3 chips, and their FM radio or LCD subsystems fail, you sell them to APple to put in the iPod Shuffle...
The thing is, defective chips are already sorted into bins like this. Processors are binned by clock speed... buy a low-speed CPU and it could well have come from the same run as its higher-speed cousin. Memory has mechanisms to allow for a certain number of bad cells. It wouldn't surprise me at all if some 2-pipeline GPUs are 4-pipeline versions that failed the 3rd or 4th pipeline.
I don't know how much headroom is left.
Well there's a big difference (Score:4, Insightful)
However that's not true of the digital world. It is important that my processor gets the right answer to a calculation everytime, all the time. It is important that the data stored in RAM is always accurate. If any of these fail, well it can fuck things up and you can't predict what. Maybe it's the least significant bit of a sample in an audio file and I never know. Maybe it's a bit in the address of a jump in a driver interrupt and it brings the whole system crashing down.
So while I'm not really worried if all the resistors in my powersupply are precisely to spec because who cares if it produces 11.5v instead of 12v? I am VERY concerned that my CPU might give me anything ever but a completely accurate and predictable result.
Also, it can make a difference in the analogue domain too. The military is pickey for a reason. If a TV fails, no big deal. If an F16 fails, that's a big deal. However on a more mundane level you'll find milspec parts in use. I built a headphone amp using all 1% (or better) milspec resistors. Why? Well, they sound better. The design (metal film instead of carbon) has better audio characteristics, their resistance changes less with temperature, and the closer matched they are, the closer the output of the channels of the amp are.
Re:This is completely bogus. (Score:2, Insightful)
Probably not.. Yields are too good (Score:5, Insightful)
There have been moments in DRAM history when devices were made that were configured in some way during final test to work around bad spots. IBM did it for a while in the 1980s, I think. But with 90+% yields, it's not worth the added switching you need on chip to allow that. You could, in theory, use heavy ECC to tolerate a substantial defect rate. That's how CD-ROMs work, after all. But it's not necessary yet.
For a while, there was a market for DRAM with bad spots for use in telephone answering machines.
This is an idea that resurfaces periodically in the semiconductor history, but historically, the yields have always come up to acceptable levels.
Re:Nothing new (Score:4, Insightful)
It would be nice to get one or the other though. Both flawed AND expensive is a real drag.
No chance in heck this can work. (Score:3, Insightful)