Forgot your password?
typodupeerror
Technology

Faulty Chips Might Just be 'Good Enough' 342

Posted by timothy
from the next-week-the-soda-ring-hammock dept.
Ritalin16 writes "According to a Wired.com article, 'Consumer electronics could be a whole lot cheaper if chip manufacturers stopped throwing out all their defective chips, according to a researcher at the University of Southern California. Chip manufacturing is currently very wasteful. Between 20 percent and 50 percent of a manufacturer's total production is tossed or recycled because the chips contain minor imperfections. Defects in just one of the millions of tiny gates on a processor can doom the entire chip. But USC professor Melvin Breuer believes the imperfections are often too small for humans to even notice, especially when the chips are to be used in video and sound applications.' But just in case you do end up with a dead chip, here is a guide to making a CPU keychain."
This discussion has been archived. No new comments can be posted.

Faulty Chips Might Just be 'Good Enough'

Comments Filter:
  • No Thank You. (Score:5, Insightful)

    by Anonymous Coward on Saturday March 19, 2005 @08:49PM (#11987781)
    I'd rather my chip works as advertised.
  • I'm not so sure... (Score:5, Insightful)

    by Bigthecat (678093) on Saturday March 19, 2005 @08:52PM (#11987797)
    Considering the amount of defective products that make it into our hands already after this 'quality assurance' for various reasons, I'm not sure adding more that already have a defect, however minor, is such a brilliant move.

    It may seem that there's a basic linear line between over-the-top quality control and cost and more economical quality control and cost, however one has to think that if it turns out that these chips are more likely to have defects in them and in fact do in the future, how long will costs remain low? The chip will still be useless and will have to be replaced, added to that the cost of making the returns from the customer/store and then the possible customer dissatisfaction with the company's quality which could result in a lost sale in the future. Will it actually be cheaper in the long term?

  • by BillsPetMonkey (654200) on Saturday March 19, 2005 @08:53PM (#11987804)
    Then why not have analogue processors instead of digital processors. Seriously - they're much faster than digital switches.

    The only reason for moving to digital switches was accuracy - the cost of the first digital bitflipper processors was far more expensive than valve technology was in 1950s and 1960s. And that really was the only reason for changing to digital processors.
  • Nothing new (Score:5, Insightful)

    by Rosco P. Coltrane (209368) on Saturday March 19, 2005 @08:54PM (#11987817)
    LCD manufacturers routinely put defective screens on the market, on the premise that a dead pixel here or there "won't be noticed". Too bad, because consumers do notice and do tend to return the product equipped with the dodgy screen, only to be told that it's "normal".

    In short: computers suck...
  • by SA Stevens (862201) on Saturday March 19, 2005 @08:54PM (#11987820)
    The CPU vendors are already doing a 'sort and grade' operation, when they label processors. Have been for years. When the yield from the fab is lower-grade, the dies get packaged and labelled as lower-speed parts.

    Then the Overclockers come in and ramp the speed back up, and claim 'the faster chips are a ripoff' and complain that 'Windows is always crashing.'
  • by seanadams.com (463190) * on Saturday March 19, 2005 @08:58PM (#11987842) Homepage
    Many of the chips fail inspection prior to going into the package, and then some more fail functional test after that. Probably more than half the price of a chip is the factory itself and the R&D work which is amortized over so many zillions of parts, and much of the rest is all the handling, packaging, shipping, and middlemen. I'd guess less than 10% is per-part materials and labor.

    Therefor throwing away a $2 chip during production doesn't cost $2. It's only worth $2 by the time the customer pays for it.

    Sure you could sell the defects at some discount, but it's only worth the trouble for some high volume part like RAM where defects are easily useable, and definitely NOT a part where the impact of some particular defect in the end user's application could be really hard to characterize (like a CPU).
  • by Rosco P. Coltrane (209368) on Saturday March 19, 2005 @09:00PM (#11987858)
    Then the Overclockers come in and ramp the speed back up, and claim 'the faster chips are a ripoff' and complain that 'Windows is always crashing.'

    Perhaps it's because Windows also tends to crash on normal machines too? I mean, you never hear *nix overclockers complain that their OS crashes all the time do you?
  • by TheSHAD0W (258774) on Saturday March 19, 2005 @09:11PM (#11987915) Homepage
    ...They'd already be doing it.

    Please remember that this is the same industry that came up with the 80486SX when they were having lousy yields on 80486DX chips. If these processors had any utility, trust me, they'd find a way to make money off 'em.
  • by mjh49746 (807327) on Saturday March 19, 2005 @09:20PM (#11987957)
    I'll have to agree. I've just RMA'd a DVD burner a day after I got it back from the last RMA. Not to mention having to RMA a stick of RAM not three months ago. QA seems to be a really sad joke, these days.

    We've already got enough bad components floating around. We surely don't need any more.

  • by david.given (6740) <dg AT cowlark DOT com> on Saturday March 19, 2005 @09:27PM (#11988002) Homepage Journal
    And anybody who actually knew anything about computers would know this. TFA doesn't mention what this guy is a professor of --- I bet it's not electronics.

    Basically, the problem is this. With mechanical and analogue devices, most of the time you know that if you change the inputs a small amount, the outputs will change a small amount.

    But digital devices are chaotic. Change one bit in the input, and the output is likely to be radically different. One bit in the wrong place on a Windows system can make the difference between Counterstrike and a BSOD.

    You can use substandard devices for some applications; dodgy RAM, for example, can be used to store audio on, and it would work just as well for video framebuffers. But you could never put anything programmatic on it; that has to be perfect.

    (IIRC, they do recycle faulty wafers. One of the ways is to scrape the doped layer off and turn them into solar cells. I don't know if they can use them again for ICs, though.)

  • Re:Stories (Score:5, Insightful)

    by bbrack (842686) on Saturday March 19, 2005 @09:32PM (#11988029)
    If you go and buy a handful of 5% resistors, you will find ~0 that are within 2% of their value - if you buy 2%, none w/in 1%, etc...

    Manufacturers are VERY aware they can charge a larger premium for better parts
  • by Glowing Fish (155236) on Saturday March 19, 2005 @09:33PM (#11988033) Homepage
    If you look at what the "big ticket" items are in the US economy, electronics and medicine are up at the top of the list.
    And the reason for this is, as you get closer to perfection, it takes more and more of an economic cost, in terms of money or resources or time or effort. For a computer or a medicine to go from 90 percent to 99 percent utility means a ten fold increase in price.
    Thats why the constant quest to have "perfect" electronics and medicine is driving up the prices of these things to the point where normal people can't afford them. If we could accept that we didn't always need new, perfect, shiny medicines and electronics, it would put them in a sane price range.
  • Good Use (Score:2, Insightful)

    by d1g1t4l (869211) on Saturday March 19, 2005 @09:38PM (#11988070)
    Faulty Chips can be used to generate "true" random numbers.
  • by taniwha (70410) on Saturday March 19, 2005 @09:39PM (#11988073) Homepage Journal
    (I'm a sometimes chip architect, so some background) - there's two sorts of tests that go on when you fab chips - functionality (do they do the right thing, are all the gates working, are all the wires connected) and speed (does it go fast enough).

    For most chips, except ones like CPUs where you can charge a premium you don't speed bin (it costs lots of money), you pick a speed you think it should go at and toss the rest. Shipping chips that almost work is bad business - think about it, I make a $5 chip it gets put in a $100 product, if 10% of my chips don't work my customer loses $100 for every $50 he pays me, I have to get my failure rate down so it's in the noise as far as the customers are concerned, otherwise they'll go to the competition.

    I think that the number of applications the original article's talking about where chip errors are tollerable are pretty small, suppose my CPU has a bit error in the LSB of the integer adder, the IRS may not care if my taxes are off by 1c, but the MSB is a different matter ("sir you appear to owe us 40M$"). On the other hand an LSB error is a big deal if the value you are dealing with is a memory pointer and breaks a program just as badly as if it is the MSB.

    Finally a word about "metastability" - all chips with more than one clock (video cards are great examples" have to move signals between clock domains - this means that signals can be sampled wrongly (well designed logic should handle this) or in rare cases suffer metastability where the result causes unstable logic values to be latched into flops (usually these look like a value that swings wildly between 0 and 1 at a freq much higher the normal clock, a flop in a metastable state can 'pollute' other flops downstream from it turning a chip into a gibbering wreck. Now well designed logic doesn't do this very often, the flops chosen for crossing clock domains are often special anti-metastability flops used not for their speed or their size but their robustness - but the physics of the situation means that it's simply not possible to avoid - just possible to make it not happen very often. What you do need to do is figure out how often something will fail and pick a MTBF that is appropriate for your device ... I once found myself discussing this issue around a video chip we were designing and basically what it came down to was comparing the theoretical worst case failure rate (chip people tend to be very conservative, keeps us on the right side of Murphy) of our chip with Windows - our chip might fail once a year (and even then there was a pretty good chance you wouldn't notice it) while back then windows blue screen every day - would anyone notice? nope

  • Re:No Thank You. (Score:5, Insightful)

    by js7a (579872) <james AT bovik DOT org> on Saturday March 19, 2005 @09:40PM (#11988078) Homepage Journal
    The problem is that there is essentially no way to write a regression test that checks the operation for any permutation of states. Electrical problems with chip lithography, when they arise, are often dependent on a particular problem of indeterminate rarity.

    If a CPU producer passed a general purpose chip, and it ended up that the defect was responsible for a tort, then they might be liable.

    Their ought to be three bins: MIL-SPEC, No Defects, and Defect Detected But Passed Regression Suite. Anyone purchasing from the third bin has to accept liability for unforseen malfunctions.

  • Re:Nothing new (Score:5, Insightful)

    by spagetti_code (773137) on Saturday March 19, 2005 @09:56PM (#11988161)
    As per the article - it depends on the way it fails.

    I bought a friend an LCD and it had a single pixel fault - bright green always on right in the middle. Made the display unusable. Manufacturer pointed to their returns policy of 5 deal pixels and would not accept it back.

    If the pixel had been at the corner - no problem.

    Problem with looking for CPU failures is that there are a very large number of ways chips could fail - and for each of these you have to try and ascertain what the impact of the failure is.

    It would be very hard to ascertain the myriad of impacts a single gate failure could have, let alone the combinations of multiple failures.

    I would hate the manufacturers to create a second tier of CPUs at a lower price point - the ways these chips could cause my s/w to fail would be vexing.

  • by argent (18001) <peter@slashdot.2 ... com minus physic> on Saturday March 19, 2005 @10:11PM (#11988236) Homepage Journal
    but anything like a graphic chip is going to be too complex to handle.

    Depends...

    Graphics chips these days have multiple pipelines, and are shipped in variants with different numbers of pipelines. If you can build a board that lets you use (say) any two pipelines out of a 4-pipeline chip, then you can use more of the defective chips. Similarly, if you're making MP3 chips, and their FM radio or LCD subsystems fail, you sell them to APple to put in the iPod Shuffle...

    The thing is, defective chips are already sorted into bins like this. Processors are binned by clock speed... buy a low-speed CPU and it could well have come from the same run as its higher-speed cousin. Memory has mechanisms to allow for a certain number of bad cells. It wouldn't surprise me at all if some 2-pipeline GPUs are 4-pipeline versions that failed the 3rd or 4th pipeline.

    I don't know how much headroom is left.
  • by Sycraft-fu (314770) on Saturday March 19, 2005 @10:17PM (#11988266)
    For most applications, the specific resistance isn't all that pickey. 5% is often good enough. Also, it's often not even the absolute value that's important, but the relitive value that's important. You have a device with 3 channels each with a 1k resistor. It doens't matter that the resistors are 1k, it matters that they are all the same value, and somewhere around 1k, etc.

    However that's not true of the digital world. It is important that my processor gets the right answer to a calculation everytime, all the time. It is important that the data stored in RAM is always accurate. If any of these fail, well it can fuck things up and you can't predict what. Maybe it's the least significant bit of a sample in an audio file and I never know. Maybe it's a bit in the address of a jump in a driver interrupt and it brings the whole system crashing down.

    So while I'm not really worried if all the resistors in my powersupply are precisely to spec because who cares if it produces 11.5v instead of 12v? I am VERY concerned that my CPU might give me anything ever but a completely accurate and predictable result.

    Also, it can make a difference in the analogue domain too. The military is pickey for a reason. If a TV fails, no big deal. If an F16 fails, that's a big deal. However on a more mundane level you'll find milspec parts in use. I built a headphone amp using all 1% (or better) milspec resistors. Why? Well, they sound better. The design (metal film instead of carbon) has better audio characteristics, their resistance changes less with temperature, and the closer matched they are, the closer the output of the channels of the amp are.
  • by amRadioHed (463061) on Saturday March 19, 2005 @11:52PM (#11988714)
    Chaotic is not really the opposite of deterministic. At least not to mathematicians. In math chaotic refers to complex systems where a tiny change in the beginning state results in a huge change in the end state. In fact, that is the same as in a computer system. Complex systems studied my mathematicians are unpredictable only because it is impossible to have perfect knowledge of the state of a complex system, not because they are non-deterministic.
  • by Animats (122034) on Sunday March 20, 2005 @12:27AM (#11988843) Homepage
    Yield data is hard to come by, but here's a business school case study with some hard numbers. [ucla.edu] Pentium 4 yields were around 60%, and DRAM yields were around 90%, in 2002. Pentium 4 yields are probably well above that point now, since that technology has matured. Note the comment in that paper that in DRAM fabs, at initial startup, yields may be as low as 5-10%, but rise to 90-98% once the fab is running properly. So that's where you put your effort, not into finding ways to use the rejects.

    There have been moments in DRAM history when devices were made that were configured in some way during final test to work around bad spots. IBM did it for a while in the 1980s, I think. But with 90+% yields, it's not worth the added switching you need on chip to allow that. You could, in theory, use heavy ECC to tolerate a substantial defect rate. That's how CD-ROMs work, after all. But it's not necessary yet.

    For a while, there was a market for DRAM with bad spots for use in telephone answering machines.

    This is an idea that resurfaces periodically in the semiconductor history, but historically, the yields have always come up to acceptable levels.

  • Re:Nothing new (Score:4, Insightful)

    by snilloc (470200) <jlcollins@@@hotmail...com> on Sunday March 20, 2005 @12:30AM (#11988857) Homepage
    One consumer will complain about the stuck pixel on their new laptop, immediately after complaining about the price of the laptop. You can't have both. As quality approaches "perfect", cost increases exponentially, for any product.

    It would be nice to get one or the other though. Both flawed AND expensive is a real drag.

  • by Ancient_Hacker (751168) on Sunday March 20, 2005 @05:11AM (#11989722)
    What is this guy a professor of? As others have noted, this isnt very likely to work in practice. It's not even good enough for an answering machine if it compresses the audio. Any good compression method is likely to be tripped up by even one bad bit. After all the goal of compression is to make every bit count! In the case of CPU's, it doesnt seem likely that a random stuck bit is going to be innocuous. The quoted example of a LSB stuck on an adder is very contrived-- The arithmetic adder is probably less than 1% of a CPU's real estate. And again, even a LSB error is going to be unacceptable if any compressed or encrypted data goes through the adder, which is extremely likely these days. And let's not forget programs like compilers and linkers, which use the adder to calculate things like addresses. Off by a bit isnt going to cut it for avery large range of applications. And this guy got $1M to research this hare-brain idea? Sheesh.

"If value corrupts then absolute value corrupts absolutely."

Working...