Forgot your password?
typodupeerror
Technology

Faulty Chips Might Just be 'Good Enough' 342

Posted by timothy
from the next-week-the-soda-ring-hammock dept.
Ritalin16 writes "According to a Wired.com article, 'Consumer electronics could be a whole lot cheaper if chip manufacturers stopped throwing out all their defective chips, according to a researcher at the University of Southern California. Chip manufacturing is currently very wasteful. Between 20 percent and 50 percent of a manufacturer's total production is tossed or recycled because the chips contain minor imperfections. Defects in just one of the millions of tiny gates on a processor can doom the entire chip. But USC professor Melvin Breuer believes the imperfections are often too small for humans to even notice, especially when the chips are to be used in video and sound applications.' But just in case you do end up with a dead chip, here is a guide to making a CPU keychain."
This discussion has been archived. No new comments can be posted.

Faulty Chips Might Just be 'Good Enough'

Comments Filter:
  • by Anonymous Coward on Saturday March 19, 2005 @08:44PM (#11987744)
    If ever a story was appropriate for Slashdot.
    • by Anonymous Coward
      Good Enough - That's my motto. Don't tell my girlfriend.
  • by justins (80659) on Saturday March 19, 2005 @08:45PM (#11987754) Homepage Journal
    Don't throw away those "almost perfect" CPUs! Give them to needy people in the third world!

    So they can remark them and sell them back to us...
  • by no parity (448151) on Saturday March 19, 2005 @08:45PM (#11987755)
    And for a long time so. "Audio RAM" is the euphemism.
    • by gl4ss (559668) on Saturday March 19, 2005 @09:41PM (#11988089) Homepage Journal
      and in counterfeit ram ;)

      don't forget that.

      but the real reason for disposal i think is that throwing away at that early saves money from the manufacturers, like, it's much cheaper to throw away one chip than to throw away a tv that doesn't work good enough to be sold.

      however.. what would be the good solution? maybe build the chips redundantly so that it wouldn't matter if one gate didn't work?
    • Jeez, tell me about it. I just got new parts for a computer recently, and the DIMM they sent me in a "tested" barebones consisting of motherboard, CPU, RAM, and case was incredibly bad. It was a 1 gig stick of Kingtson value RAM, PC3200. Samsung TCCC, too, so the fact that it was faulty is a damn shame(TCCD would have been nicer though). I had to run the stupid thing at DDR266 speeds(133 mhz) with ridiculously high timings(something like 3-6-7-15) just to install WinXP. Good thing I was able to RMA tha
  • by leshert (40509) on Saturday March 19, 2005 @08:45PM (#11987757) Homepage
    If I remember correctly, digital answering machines use "reject" RAM chips that aren't suitable for data storage, because minor dropped bits in a recorded message aren't discernible.
    • My guess is the few things defect chips are used in like digital answering machines don't come anywhere close to using up the supply out there. So most defective chips are still getting trashed/recycled.
    • i486 SX vs DX? (Score:5, Interesting)

      by Mac Mini Enthusiast (869183) <mac.mini.enthusiast@nOsPAm.gmail.com> on Saturday March 19, 2005 @09:22PM (#11987969) Homepage
      Wasn't that the difference between the 486 SX and 486 DX, regarding the math coprocessor? Actually, I've heard two versions of the story. One is that the SX had the math coprocessor intentionally crippled by Intel, but sold for a cheaper price for larger volume sales.

      The other version was that the coprocessor had the highest failure rating for the chip fabrication. So on these chips with a failed copressor, the coprocessor was turned off, but the rest of the chip was still usable.

      I vaguely remember this whole practice was described in a computer book my friend was reading, because I remember a joke the author told about computer salesmen. Unfortunately I only remember the joke, not the useful info from that book. (This joke comes from the days of small computer shops)
      Q : What's the difference between a computer salesman and a car salesman?
      A : The car salesman knows when he's ripping you off.

      • Re:i486 SX vs DX? (Score:5, Informative)

        by bulliver (774837) <bulliver@@@gmail...com> on Saturday March 19, 2005 @09:43PM (#11988097) Homepage

        I remembered reading something like that so I dug out an old book of mine, "Upgrading and Repairing PCs" by Scott Mueller (2000):

        The 486SX chip is more a marketing quirk than new technology. Early versions of the 486SX chip actually were DX chips that showed defects in the math-coprocessor section. Instead of being scrapped, the chips were packaged with the FPU section disabled and sold as SX chips.
      • Re:i486 SX vs DX? (Score:4, Interesting)

        by erice (13380) on Saturday March 19, 2005 @09:53PM (#11988148) Homepage
        Actually, there's no difference. If the supply of 486's with defective FPU's exceeded the demand for 486SX's, then all 486SX's shipped would have disabled defective FPU's. If the demand supply of 486's with deffective FPU was less than the demand for 486SX's, then Intel would disable the FPU's on perfectly functional 486's and sell them as 486SX.
        Manufacturer's do the same trick with speed grades. That's the principle reason why CPU's can often be overclocked beyond their rated maximum.

        A more interesting thing about the 486SX/486SX is that the 487SX was, in fact, a complete 486. When plugged into the FPU socket, it disabled the 486SX entirely.

        Intel claimed that the disabled FPU in the 486SX was only a temporary thing. Eventually, there would be a unique die for the 486SX and it wouldn't have an FPU at all. I kind of doubt this ever happened. The 486SX wasn't very popular.
        • Re:i486 SX vs DX? (Score:4, Interesting)

          by Talez (468021) on Saturday March 19, 2005 @10:24PM (#11988294)
          Eventually, there would be a unique die for the 486SX and it wouldn't have an FPU at all. I kind of doubt this ever happened.

          It did. By late 1991 the 486SX die was completely different with the co-processor removed.
        • Intel claimed that the disabled FPU in the 486SX was only a temporary thing. Eventually, there would be a unique die for the 486SX and it wouldn't have an FPU at all. I kind of doubt this ever happened. The 486SX wasn't very popular.

          If they could save die area, you can bet that they would have removed the FPU. The cost of any silicon chip is roughly proportional to the cube of the die area. (You can easily figure this out. Just consider that the cost per wafer and the number of defects per wafer are al

    • If I remember correctly, digital answering machines use "reject" RAM chips that aren't suitable for data storage, because minor dropped bits in a recorded message aren't discernible.

      Correction...

      Digital Answering machines that use "reject" Ram chips that are not suitable for anything, because random dropped bits in a recorded message renders it discernible.

      At least that's been MY experience with them. Dropped bits always occur when ever anyone relays important information like a phone number, time, pla
  • Old? (Score:2, Informative)

    by Borgschulze (842056)
    Didn't I read this a few days ago... seems old... Like I already read it...
  • No Thank You. (Score:5, Insightful)

    by Anonymous Coward on Saturday March 19, 2005 @08:49PM (#11987781)
    I'd rather my chip works as advertised.
    • Re:No Thank You. (Score:5, Insightful)

      by js7a (579872) <james@bovikBOHR.org minus physicist> on Saturday March 19, 2005 @09:40PM (#11988078) Homepage Journal
      The problem is that there is essentially no way to write a regression test that checks the operation for any permutation of states. Electrical problems with chip lithography, when they arise, are often dependent on a particular problem of indeterminate rarity.

      If a CPU producer passed a general purpose chip, and it ended up that the defect was responsible for a tort, then they might be liable.

      Their ought to be three bins: MIL-SPEC, No Defects, and Defect Detected But Passed Regression Suite. Anyone purchasing from the third bin has to accept liability for unforseen malfunctions.

  • by Anonymous Coward on Saturday March 19, 2005 @08:49PM (#11987783)
    Actually, the supercomputers used for warfare and conflict analysis at the Pentegon and the CIA use these rejected chips.

    In addition, they are used in the so-call "Star Wars" missle defense system prototype.

    Although these chips don't actually work, the results are often good enough for their purposes.
  • I'm not so sure... (Score:5, Insightful)

    by Bigthecat (678093) on Saturday March 19, 2005 @08:52PM (#11987797)
    Considering the amount of defective products that make it into our hands already after this 'quality assurance' for various reasons, I'm not sure adding more that already have a defect, however minor, is such a brilliant move.

    It may seem that there's a basic linear line between over-the-top quality control and cost and more economical quality control and cost, however one has to think that if it turns out that these chips are more likely to have defects in them and in fact do in the future, how long will costs remain low? The chip will still be useless and will have to be replaced, added to that the cost of making the returns from the customer/store and then the possible customer dissatisfaction with the company's quality which could result in a lost sale in the future. Will it actually be cheaper in the long term?

    • by mjh49746 (807327) on Saturday March 19, 2005 @09:20PM (#11987957)
      I'll have to agree. I've just RMA'd a DVD burner a day after I got it back from the last RMA. Not to mention having to RMA a stick of RAM not three months ago. QA seems to be a really sad joke, these days.

      We've already got enough bad components floating around. We surely don't need any more.

      • Totally not unusual. We get bad batches of various computers parts every week. Often, it is generic ram. Lately, LG has switched manufacturers, and their DVD burners have gone to shit. When we get a bad batch, 90% of it gets RMA'ed. I'm just glad that most of it is used inhouse instead of customers getting it and returning it.
  • by BillsPetMonkey (654200) on Saturday March 19, 2005 @08:53PM (#11987804)
    Then why not have analogue processors instead of digital processors. Seriously - they're much faster than digital switches.

    The only reason for moving to digital switches was accuracy - the cost of the first digital bitflipper processors was far more expensive than valve technology was in 1950s and 1960s. And that really was the only reason for changing to digital processors.
    • Not quite (Score:5, Interesting)

      by beldraen (94534) <chad DOT montplaisir AT gmail DOT com> on Saturday March 19, 2005 @09:15PM (#11987935)
      While I agree that analog processors probably hold some promise, there is one large issue with them: heat. A major reason why processors get hot in the first place is that after each cycle the state is returned to a neutral position, which usually means grounding the gates to discharge them. This waste energy has a large conversion over to heat. Analog processors can really be thought of digital with multiple states, instead of two. This means that while more work can be done, there is larger values of charge to disapate.

      What has always had my curiousity for why it has not been seemly worked on is "reversable" chips. There are essentially two sets for every mechanism and the system toggles back and forth. The discharge of the old system is used to drive the new mechanism; thus, a lot of wasted discharge is conserved for reuse. Reversable chips are reported to generate far, far less heat. I have heard that Intel and others know about this, but it is simply a better immediate investment because consumers are happy paying for the current line of toasters.
      • Re:Not quite (Score:2, Interesting)

        While I agree that analog processors probably hold some promise, there is one large issue with them: heat.

        Yes and no, depends how your operating the transistors. For example, ECL (Emitter-Coupled Logic) runs quite fast and doesn't saturate the transistors, contrasted to what TTL does. By not saturating they're able to switch states quite quickly, but they dissipate power like crazy. As of 7 years ago you could easily find ECL lines (For example this AND/NAND chip [onsemi.com] can work at least to 3 GHz. This is

      • Re:Not quite (Score:2, Informative)

        by Formica (775485)
        This is a fair amount of research going on for this; it's known as adiabatic logic [google.com].

        Here's a short paper on how it's clocked:
        Charge Recycling Clocking for Adiatbatic Style Logic [berkeley.edu]

        Formica
      • I have heard of reversible computing from a guy that would stand to know a lot about it, a physics, math and CS major. The biggest problem is that it probably doubles the number of transistors and slow the chip down because signals have to traverse longer distances because of the more transistors.

        It would probably be cheaper to use Pentium M or the lowest power Turion style chips than to switch to reversible logic. For the most part, PM and Turion chips are very close in performance with their desktop co
  • Nothing new (Score:5, Insightful)

    by Rosco P. Coltrane (209368) on Saturday March 19, 2005 @08:54PM (#11987817)
    LCD manufacturers routinely put defective screens on the market, on the premise that a dead pixel here or there "won't be noticed". Too bad, because consumers do notice and do tend to return the product equipped with the dodgy screen, only to be told that it's "normal".

    In short: computers suck...
    • Re:Nothing new (Score:5, Insightful)

      by spagetti_code (773137) on Saturday March 19, 2005 @09:56PM (#11988161)
      As per the article - it depends on the way it fails.

      I bought a friend an LCD and it had a single pixel fault - bright green always on right in the middle. Made the display unusable. Manufacturer pointed to their returns policy of 5 deal pixels and would not accept it back.

      If the pixel had been at the corner - no problem.

      Problem with looking for CPU failures is that there are a very large number of ways chips could fail - and for each of these you have to try and ascertain what the impact of the failure is.

      It would be very hard to ascertain the myriad of impacts a single gate failure could have, let alone the combinations of multiple failures.

      I would hate the manufacturers to create a second tier of CPUs at a lower price point - the ways these chips could cause my s/w to fail would be vexing.

    • One consumer will complain about the stuck pixel on their new laptop, immediately after complaining about the price of the laptop. You can't have both. As quality approaches "perfect", cost increases exponentially, for any product.

      Chips can be tested while still on their wafer, along with the other 70 or so dies. Bad died do not need to continue through the manufacturing process. LCD displays have to be mostly assembled before they can be tested. There's a much larger cost associated per unit for LCDs
  • Caveats (Score:2, Informative)

    by karvind (833059)
    Testing ICs is an exponentially hard problem these days. One-third of the cost is devoted to it. Thus it may be a good idea to test the chip for only the applications it is needed (in some restrictive environments) and if it passes, it can still be deployed. It will ease some of the economic hammer on the manufacturing these days.

    Xilinx offer EasyPath [xilinx.com] option by testing for a customer-specific application. Customers use EasyPath customer specific FPGAs to achieve lower unit costs for volume production once

  • by SA Stevens (862201) on Saturday March 19, 2005 @08:54PM (#11987820)
    The CPU vendors are already doing a 'sort and grade' operation, when they label processors. Have been for years. When the yield from the fab is lower-grade, the dies get packaged and labelled as lower-speed parts.

    Then the Overclockers come in and ramp the speed back up, and claim 'the faster chips are a ripoff' and complain that 'Windows is always crashing.'
    • Then the Overclockers come in and ramp the speed back up, and claim 'the faster chips are a ripoff' and complain that 'Windows is always crashing.'

      Perhaps it's because Windows also tends to crash on normal machines too? I mean, you never hear *nix overclockers complain that their OS crashes all the time do you?
      • "you never hear *nix overclockers complain that their OS crashes all the time do you?"

        Yes? Although more normally it's things like gcc throwing tonnes of SEGV's and the odd bit of filesystem corruption. Maybe Windows just has more assert()s.
    • by taniwha (70410) on Saturday March 19, 2005 @09:39PM (#11988073) Homepage Journal
      (I'm a sometimes chip architect, so some background) - there's two sorts of tests that go on when you fab chips - functionality (do they do the right thing, are all the gates working, are all the wires connected) and speed (does it go fast enough).

      For most chips, except ones like CPUs where you can charge a premium you don't speed bin (it costs lots of money), you pick a speed you think it should go at and toss the rest. Shipping chips that almost work is bad business - think about it, I make a $5 chip it gets put in a $100 product, if 10% of my chips don't work my customer loses $100 for every $50 he pays me, I have to get my failure rate down so it's in the noise as far as the customers are concerned, otherwise they'll go to the competition.

      I think that the number of applications the original article's talking about where chip errors are tollerable are pretty small, suppose my CPU has a bit error in the LSB of the integer adder, the IRS may not care if my taxes are off by 1c, but the MSB is a different matter ("sir you appear to owe us 40M$"). On the other hand an LSB error is a big deal if the value you are dealing with is a memory pointer and breaks a program just as badly as if it is the MSB.

      Finally a word about "metastability" - all chips with more than one clock (video cards are great examples" have to move signals between clock domains - this means that signals can be sampled wrongly (well designed logic should handle this) or in rare cases suffer metastability where the result causes unstable logic values to be latched into flops (usually these look like a value that swings wildly between 0 and 1 at a freq much higher the normal clock, a flop in a metastable state can 'pollute' other flops downstream from it turning a chip into a gibbering wreck. Now well designed logic doesn't do this very often, the flops chosen for crossing clock domains are often special anti-metastability flops used not for their speed or their size but their robustness - but the physics of the situation means that it's simply not possible to avoid - just possible to make it not happen very often. What you do need to do is figure out how often something will fail and pick a MTBF that is appropriate for your device ... I once found myself discussing this issue around a video chip we were designing and basically what it came down to was comparing the theoretical worst case failure rate (chip people tend to be very conservative, keeps us on the right side of Murphy) of our chip with Windows - our chip might fail once a year (and even then there was a pretty good chance you wouldn't notice it) while back then windows blue screen every day - would anyone notice? nope

  • by Anonymous Coward on Saturday March 19, 2005 @08:56PM (#11987832)
    Micron started a group over 15 years ago that tests RAM chips at all stages of production that fails testing.

    When I worked there it was called the "Partials Division". This group invented the "audio ram" market. They have a wide ranging sorting and grading process. It is called "SpecTek" I believe now. I sometimes see low end memory modules with SpecTek Ram.
    12 years ago, I was production technician in a Surface Mount Assembly division that shared a building with Partials. We used to assemble memory modules and even video cards that used "PC grade" chips from the partials group. Everyone said they were good enough, but personally I have always steered clear of them.
    The last year I was at Micron, we had a lot of discussions with NEC, Intel and some Russian Fabs to provide the same services to them. We tested a couple million chips from these companies in tests. Never did hear what the end result was.
  • by seanadams.com (463190) * on Saturday March 19, 2005 @08:58PM (#11987842) Homepage
    Many of the chips fail inspection prior to going into the package, and then some more fail functional test after that. Probably more than half the price of a chip is the factory itself and the R&D work which is amortized over so many zillions of parts, and much of the rest is all the handling, packaging, shipping, and middlemen. I'd guess less than 10% is per-part materials and labor.

    Therefor throwing away a $2 chip during production doesn't cost $2. It's only worth $2 by the time the customer pays for it.

    Sure you could sell the defects at some discount, but it's only worth the trouble for some high volume part like RAM where defects are easily useable, and definitely NOT a part where the impact of some particular defect in the end user's application could be really hard to characterize (like a CPU).
  • the FUTURE (Score:5, Interesting)

    by k4_pacific (736911) <`moc.oohay' `ta' `cificap_4k'> on Saturday March 19, 2005 @08:58PM (#11987844) Homepage Journal
    In the FUTURE, single core processors will be dual core processors where one side didn't pass quality control. Someone will eventually figure out how to hack the chip to use both halves anyways, and the market will be flooded with cheap dual core chips that don't always work. Remember, you read it here first.
    • Re:the FUTURE (Score:5, Interesting)

      by ltbarcly (398259) on Saturday March 19, 2005 @09:20PM (#11987961)
      Probably. But only for one revision, then they'll stop it. This has been going on forever. The 486sx was identical to the dx early on, except the FPU was disabled. I have never heard of a hack to get around this. Video chips are the same story, a radeon 9500 IS a 9700, with half the pixel paths disabled usually due to defect. You can get around this in software even.

      Here is where you can make out like a bandit. Buy up a bunch of the revision which is hackable. Then, hack the ones you can and sell them as such. Then wait until supplies run out, and sell the ones where the hack failed on ebay. People will be on the lookout for the hackable version, and will pay a premium to get it from you. Oh, don't mention that you already tried it and it didn't work. They get exactly what they paid for, so this isn't dishonest in the least.

      Actually, this happened to me. I wanted the Radeon 9500 with the ram in an L configuration, because you can soft-upgrade it to a 9700 most of the time. I bought one on ebay since there were no more on newegg. I specifically asked the guy "L shaped ram" he says yes. I get it and everything seems fine. UNTIL I lift off the heatsink. There, instead of a thermal pad or tape, is silver thermal compound. Clearly he had lifted the heatsink, and then put it back on when the hack failed. At least he was nice enough not to leave the hosed heat-tape on there. I ended up with a good upgrade for about what the newer revision would have cost anyway.

      Now, in the next revision they just update the manufacturing to make it impossible to do the hack, because it is a nightmare for them to support all the half busted products that have been 'fixed' (even if they just say no, receiving and testing those products for the hack, and even phone support, costs like a bastard), and it cuts into the sales of the top tier products, where they make the highest margin. For chip companies this is as easy as dinging the faulty side of the chip before they assemble it completely, or putting some sort of "fuse" on the silicon itself, which they then burn out if that side is faulty. There is no way to take apart a chip to work directly on the silicon, and if there is and someone actually does it it will be a "Prove you can" since the equipment will be in the millions. (I can imagine a physics grad student with access to the machinery if they are doing superconductor or quantum computing research)
    • Re:the FUTURE (Score:3, Interesting)

      by Sycraft-fu (314770)
      I have a feeling to prevent that, the companies will burn off the second processor. Not hard to burn off some critical traces so it can never be activated.
    • why wouldn't they just make the cores seperately. Do dual core chips have to be printed on the same die?
      • The general idea of dual core CPUs is that both cores are usually on the same die.

        Separating them will probably not be easy, much less repackaging them.
  • by Transcendent (204992) on Saturday March 19, 2005 @08:59PM (#11987849)
    I can see it for RAM, but for processors, I don't think so.

    Though, you would probably have to make sure that certian important data for an audio or video clip are stored in *good* memory. Or else you could run into problems where a clip doesn't know where to end.

    But, what are the odds that a null terminator gets messed up in meao90efghijklmnopqrstuvwxyz{|}~ÇüéâäàåçêëèïîìÄÅÉæ ÆôöòûùÿÖÜ£¥áíóúñÑß±÷ !"#$%&'()*+,-./0Welcome to BankOne Online banking service! Your updated credit card number is 41
    <<ERROR: Unexpected EOF >>
  • Not a good idea (Score:5, Interesting)

    by IversenX (713302) on Saturday March 19, 2005 @08:59PM (#11987851) Homepage
    There is a reason for throwing out those chips! Maybe it's true that _most_ human ears wont notice that the least significant bit has been flipped in a über-noisy phone recording for a digital answering machine, but what if it was the most significant? That would make an audible "pop".

    Ok, so maybe for non-critical equipment in the "use-and-throwaway" category. But this will not bring us cheaper hardware, just less functional hardware. Those chips are _literally_ going nowhere slow.

    If you've ever had to debug something that turned out to be flaky hardware, you KNOW it's a PITA. If anything, awareness should be increased when it comes to the really cheap brands. They aren't always very stable, but people sometimes go for the cheapest RAM anyway, and then complain to ME when it doesn't work. There actually is some connection between what you pay, and what you get. Argh.

    I'm done rambling now, thanks for waiting..
  • Faulty Chips (Score:3, Informative)

    by p0rnking (255997) on Saturday March 19, 2005 @08:59PM (#11987854) Homepage
    I'm sure I read something, a long long time ago, that mentioned that Celerons were "faulty" versions of the Pentiums (and a comparison was made that the Durons were made as Durons, and weren't chips that were taken out of the garbage bins)
    • Re:Faulty Chips (Score:4, Informative)

      by Anonymous Coward on Saturday March 19, 2005 @09:12PM (#11987918)
      A Celeron was a Pentium III with a part of bad cache. The half with the bad section was marked as such, which is why the Celeron always had half the cache of a P3. They also ran them as much slower bus/core speeds. I'm not sure what the newer "P4" celerons are though. Probably the same deal.
    • The AMD XP to MP 2100 [livepublishing.co.uk] mod.

      Now, in reality Celerons have a lower cache, lower bus speed and overall lower clockspeed. As I remember, because of this the core doesn't have to pass as high a standard as the current Pentium offering.

      I'm sure there are others who would offer better knowledge on this.

  • by mobiux (118006) on Saturday March 19, 2005 @08:59PM (#11987855)
    Usually thier LE and SE models have certain branches and pipelines already disabled. Usually these disabled pipelines are damaged in some way.

  • The manufacturing process errors that cause parts to be rejected vary greatly from part to part -- they don't all fail in the same ways. Additionally, some defects are acceptable for some applications and not others. It would require a great deal of time and effort to identify the exact nature and extent of each defect found in each part, and then to match that particular part to an application that will tolerate its fault. While it is conceivably possible, it would be very difficult to implement this so
  • by G4from128k (686170) on Saturday March 19, 2005 @09:06PM (#11987890)
    Apart from some hard-wired devices (simple sound clip recorders) or downclocked low-end devices, I don't see how defective chips can be used. The article suggests that the occasional error is OK for audio and video, but how do you ensure that the faulty chip never has to handle code, memory pointers, configuration files, hashes, passwords, encrypted data, or compressed data. I suspect that modern-day audio and video datastreams are becoming more fragile as they carry more metadata, highly compressed data, DRM, software, etc.

    Something tells me that the manufacturers that use semi-defective chips are going to lose all their savings on product returns, warranty costs, and technical support. Given the low cost of most consumer electronics chips and the high cost of service labor, I doubt they will want the hassles of unreliable products.
  • by writermike (57327) on Saturday March 19, 2005 @09:06PM (#11987892)
    A bit error here, A bit error there. Pretty soon you're talking about real crashes.
  • by TheSHAD0W (258774) on Saturday March 19, 2005 @09:11PM (#11987915) Homepage
    ...They'd already be doing it.

    Please remember that this is the same industry that came up with the 80486SX when they were having lousy yields on 80486DX chips. If these processors had any utility, trust me, they'd find a way to make money off 'em.
    • If these processors had any utility, trust me, they'd find a way to make money off 'em.

      Here's what you do: take three defective chips, glue them together so that they all run in parallel, and for each output pin, the pin's state is determined by "majority rule" of the three corresponding pins on the defective chips.

      If it works for the space shuttle, it can work for your Radio Shack junk electronics.... the only inconvenient detail would be making this hack cheap/easy enough to be worthwhile...

      • Wouldn't work well, because once a failure occurred you'd have to stop everything, then get the bad chip to the right working state before proceeding further, which would be a rather difficult task, especially if the fault caused a particular state to be impossible.
  • Apparently the overclockersclub stressed there server too much. Now I'll never know how to make a keychain out of CPUs...
  • Intel, AMD, and other chip manufacturers must make a premium for their high-end chips to make a profit. They discontinue a speed grade once it hits a certain price point because it's not profitable to sell them that cheaply. I seriously doubt they're going to want to release even lower-margin, bargain chips that further undercut their more profitable high-end chips.
  • At least Intel didn't tell us that the Pentium was "good enough" with its floating point division error (even though it actually was 99.999% of the time).
  • Stories (Score:3, Interesting)

    by MagicDude (727944) on Saturday March 19, 2005 @09:21PM (#11987965)
    Reminds me of a story I heard from my high school physics teacher. He had a friend in the military doing electronics. One big part of his job was to measure resistors because military specifications required that devices have a very strict tollerance. They wouldn't use anything which was more than 1% outside of specs, and they would simply throw out the rest of the resistors they bought. So my teacher's friend would simply take all these resistors to which he had accurately measured the resistances, and sold them to the local radio shack, since they liked being able to buy resistors that were within like 2-3% of the indicated resistance (I'm not an electrician, but I believe 5% or so is considered an acceptable tollerance for general applications?), and they got them cheap, and the guy made some money since his investment was 0, since as far as the military was concerned, he was simply selling trash. Couldn't something like this be done with chips, isn't there some market for chips that are 99.9% good?
    • Re:Stories (Score:5, Insightful)

      by bbrack (842686) on Saturday March 19, 2005 @09:32PM (#11988029)
      If you go and buy a handful of 5% resistors, you will find ~0 that are within 2% of their value - if you buy 2%, none w/in 1%, etc...

      Manufacturers are VERY aware they can charge a larger premium for better parts
    • by Sycraft-fu (314770) on Saturday March 19, 2005 @10:17PM (#11988266)
      For most applications, the specific resistance isn't all that pickey. 5% is often good enough. Also, it's often not even the absolute value that's important, but the relitive value that's important. You have a device with 3 channels each with a 1k resistor. It doens't matter that the resistors are 1k, it matters that they are all the same value, and somewhere around 1k, etc.

      However that's not true of the digital world. It is important that my processor gets the right answer to a calculation everytime, all the time. It is important that the data stored in RAM is always accurate. If any of these fail, well it can fuck things up and you can't predict what. Maybe it's the least significant bit of a sample in an audio file and I never know. Maybe it's a bit in the address of a jump in a driver interrupt and it brings the whole system crashing down.

      So while I'm not really worried if all the resistors in my powersupply are precisely to spec because who cares if it produces 11.5v instead of 12v? I am VERY concerned that my CPU might give me anything ever but a completely accurate and predictable result.

      Also, it can make a difference in the analogue domain too. The military is pickey for a reason. If a TV fails, no big deal. If an F16 fails, that's a big deal. However on a more mundane level you'll find milspec parts in use. I built a headphone amp using all 1% (or better) milspec resistors. Why? Well, they sound better. The design (metal film instead of carbon) has better audio characteristics, their resistance changes less with temperature, and the closer matched they are, the closer the output of the channels of the amp are.
    • 5% is usually good enough, but many commercial operations use 1% resistors quite a bit. We do, mostly due to the fact that we need to guarantee performance across the industrial temperature range, and a 1% resistor once you account for temperature, drift over life, and a couple other things winds up being strikingly close to a 5% resistor's nominal.
  • Really. Do we need more defective products being sold under the pretense that its ok.

    Love the updated notice?
    This story has been updated to note that Melvin Breuer's research was supported by the chip industry.
    Slashdot, you care to update yours to refelect this minor detail or do you just like playing along?
  • by david.given (6740) <dg AT cowlark DOT com> on Saturday March 19, 2005 @09:27PM (#11988002) Homepage Journal
    And anybody who actually knew anything about computers would know this. TFA doesn't mention what this guy is a professor of --- I bet it's not electronics.

    Basically, the problem is this. With mechanical and analogue devices, most of the time you know that if you change the inputs a small amount, the outputs will change a small amount.

    But digital devices are chaotic. Change one bit in the input, and the output is likely to be radically different. One bit in the wrong place on a Windows system can make the difference between Counterstrike and a BSOD.

    You can use substandard devices for some applications; dodgy RAM, for example, can be used to store audio on, and it would work just as well for video framebuffers. But you could never put anything programmatic on it; that has to be perfect.

    (IIRC, they do recycle faulty wafers. One of the ways is to scrape the doped layer off and turn them into solar cells. I don't know if they can use them again for ICs, though.)

    • Digital devices are the exact opposite of chaotic- they are deterministic. The rest of your post is correct, but that's the real reason why a tiny error somewhere in the billions of RAM bits can be picked up, propagated, and use to corrupt the rest of the system.
      • Chaotic is not really the opposite of deterministic. At least not to mathematicians. In math chaotic refers to complex systems where a tiny change in the beginning state results in a huge change in the end state. In fact, that is the same as in a computer system. Complex systems studied my mathematicians are unpredictable only because it is impossible to have perfect knowledge of the state of a complex system, not because they are non-deterministic.
    • by blanktek (177640) on Saturday March 19, 2005 @10:09PM (#11988225)
      Actually he is a professor of electrical engineering systems. http://poisson.usc.edu/Breuer.html [usc.edu] But I think there is a lot of misunderstanding here about what is trying to be done. And it doesn't have to do with killing your RAM and your Counterstrike game.


      There is another article here with some extra details. http://www.isa.org/PrinterTemplate.cfm?Template=/C ontentManagement/HTMLDisplay.cfm&ContentID=42102&F useFlag=1 [isa.org] I supposed what he is doing is trying to devise NEW methods to allow chips to work properly if they have errors. That is why he is getting the big grant money. For example in data transmission if you miss a bit it can be filled in with parity checking. I am of course guessing that it could be done this way. But the point is that it is not some conspiracy to trick you into buying crappy videocards. Firms know very well that the market will prevent that or they don't get to produce.

      • I supposed what he is doing is trying to devise NEW methods to allow chips to work properly if they have errors.

        Ah, right --- that makes a difference. (Do you remember the days when Wired had articles with actual technical content? <nostalge/>)

        Hmm... I wonder if eventually we'll get processors with custom microcode to reroute around faulty subsystems?

  • by Glowing Fish (155236) on Saturday March 19, 2005 @09:33PM (#11988033) Homepage
    If you look at what the "big ticket" items are in the US economy, electronics and medicine are up at the top of the list.
    And the reason for this is, as you get closer to perfection, it takes more and more of an economic cost, in terms of money or resources or time or effort. For a computer or a medicine to go from 90 percent to 99 percent utility means a ten fold increase in price.
    Thats why the constant quest to have "perfect" electronics and medicine is driving up the prices of these things to the point where normal people can't afford them. If we could accept that we didn't always need new, perfect, shiny medicines and electronics, it would put them in a sane price range.
  • how well that work with the P54 Floating Point rounding error.

    People may not notice the problem, but if they ever find out it's there, they'll want it fixed, better to throw out a chip in the fab, than replace the product in the market.
  • Good Use (Score:2, Insightful)

    by d1g1t4l (869211)
    Faulty Chips can be used to generate "true" random numbers.
  • ...is to _test_ them.

    A chip is no good for ANYthing unless you know exactly what is wrong with it. It might not work AT ALL in a specific "audio application."

    By the time you've tested a chip enough to characerize its defects, so that you know they are not going to interfere significantly with the very specific way it is used in a specific application, you've probably added so much cost that it's probably more expensive than a perfect chip.

    In fact, you've gone away from the notion of "interchangeable part
  • I think the only answer is improvements in both recycling and manufacturing techniques, because this has to be costly when you can't deliver on an order and your competitor does. But how wasteful is it to just toss 'em? They're going to end up in a landfill within 10 years anyway. If they're sold to consumers, there's a strong probability that a whole computer containing the defective chip will end up taking up space in a landfill, rather than just the chip.

    It seems to me that the cost and energy going int
  • Fuck, no. (Score:2, Interesting)

    by imadork (226897)
    They test chips for a reason, folks. All 10 million of those transistors need to be working properly in order for the chip to work. Otherwise, it would be like a car that had two of its wires crossed: sure it might be in a nonessential system, but then again, what if it isn't?

    And all manufacturing processes fail from time to time, microchip manufacturing is no exception. In a lot of 1000 chips, you might get 1 or 2 where the silicon wafer wasn't right to begin with, or one of the layers was a millionth of

  • Other news (Score:4, Funny)

    by Frankie70 (803801) on Saturday March 19, 2005 @10:59PM (#11988469)
    In other news, flights would be much cheaper, if plane manufactures stopped Quality Control.
  • ...Sinclair Spectrums appeared to have twice as many memory chips as they needed because Sinclair bought chips where half of the chip was faulty as they were a lot cheaper and then sorted them by bank.
  • I don't like the idea of more flawed chips entering the market, but this story caused an interesting idea to pop into my head.

    Lets say a CPU runs great, but fails on a coupla instructions. Why not just compile for it sans those instructions? For this to make any sense, there would have to be plenty of similarly flawed chips to work with though.
  • Anyone here old enough to remember the 386SX? The idea of not throwing away chips with minor imperfections is clearly older than the writer of that article.
  • by Animats (122034) on Sunday March 20, 2005 @12:27AM (#11988843) Homepage
    Yield data is hard to come by, but here's a business school case study with some hard numbers. [ucla.edu] Pentium 4 yields were around 60%, and DRAM yields were around 90%, in 2002. Pentium 4 yields are probably well above that point now, since that technology has matured. Note the comment in that paper that in DRAM fabs, at initial startup, yields may be as low as 5-10%, but rise to 90-98% once the fab is running properly. So that's where you put your effort, not into finding ways to use the rejects.

    There have been moments in DRAM history when devices were made that were configured in some way during final test to work around bad spots. IBM did it for a while in the 1980s, I think. But with 90+% yields, it's not worth the added switching you need on chip to allow that. You could, in theory, use heavy ECC to tolerate a substantial defect rate. That's how CD-ROMs work, after all. But it's not necessary yet.

    For a while, there was a market for DRAM with bad spots for use in telephone answering machines.

    This is an idea that resurfaces periodically in the semiconductor history, but historically, the yields have always come up to acceptable levels.

  • Sinclair did this (Score:5, Interesting)

    by Spacejock (727523) on Sunday March 20, 2005 @12:36AM (#11988879) Homepage
    Sir Clive Sinclair used defective RAM in the ZX Spectrum way back in 1982. They were chips with only one bank working, but the computers were wired to only use that one bank.

    Old Computers Museum [old-computers.com]

    quote: "To keep the prices down Sinclair used faulty 64K chips (internally 2 X 32K). All the chips in the 32K bank of RAM had to have the same half of the 64K chips working. A link was fitted on the pcb in order to choose the first half or the second half."

    Remember, many of the best ideas have already been used.

  • by Ancient_Hacker (751168) on Sunday March 20, 2005 @05:11AM (#11989722)
    What is this guy a professor of? As others have noted, this isnt very likely to work in practice. It's not even good enough for an answering machine if it compresses the audio. Any good compression method is likely to be tripped up by even one bad bit. After all the goal of compression is to make every bit count! In the case of CPU's, it doesnt seem likely that a random stuck bit is going to be innocuous. The quoted example of a LSB stuck on an adder is very contrived-- The arithmetic adder is probably less than 1% of a CPU's real estate. And again, even a LSB error is going to be unacceptable if any compressed or encrypted data goes through the adder, which is extremely likely these days. And let's not forget programs like compilers and linkers, which use the adder to calculate things like addresses. Off by a bit isnt going to cut it for avery large range of applications. And this guy got $1M to research this hare-brain idea? Sheesh.

"The only way for a reporter to look at a politician is down." -- H.L. Mencken

Working...