Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Technology

Do Overclocked CPUs Need a "Burn In" Period? 22

SinnopS asks: "I have been dealing with computers for a few years now, yet there is one thing that i can never find the correct answer too. And that question is wheather or not an overclocked CPU need to have "burn in" time. In my expirience, the Duron that I have was very unstabil for the first few days then worked perfectly. however, some people are saying buring in is not true and a thing of the past. Can some one shed some light on this subject for me?"
This discussion has been archived. No new comments can be posted.

Do Overclocked CPUs Need a "Burn In" Period?

Comments Filter:
  • From my experience with OCing, I'd say you definitely want a burn in. "Burned in" systems are always more stable in my experience.

    I've been away from the OC world for the last year or so, but I don't see that much would have changed in that time. I don't see anything in chip design making burn ins obsolete or useless. And even if you didn't really need a burn in, what harm could it do?

    The possible benefits certainly outstrip any possible downsides.

  • by j_d ( 26865 )
    When I was in that line of work, all the PCs I built for my clients had 48 hrs of burn in testing, running various system test software. More than once, I found problems that wouldn't have shown up as anything recognizable (more like "darn PC keeps rebooting" type problems) from that testing.
  • even from the old age PII/Celeron, when i overclocked my PII 266 to 350 i had to put 2.9V instead of 2.8, and had some weird things, freeze, etc, after a few days it works well. No i'm running it at 300 (100x3) at 2.7V and it works :)
    --
  • I personally don't think a burn-in can make a cpu/chipset more stable. It will certainly help spot any elusive glitches that might otherwise not show up. For example I've had software report minor math inaccuracies on a sickly overclocked CPU which seemed fine otherwise. It didn't crash but it was giving the FPU a hard time, and that kind of problem is the most daunting since it will cause incorrect calculations for example in spreadsheets or other floating-point data, and can go unnoticed for a long time until someone double-checks.

    Other times, you might be right on the fine dotted line of heat tolerance. If your cooling is just barely adequate, the CPU might develop problems over time even though it seems to be running just fine. Eventually you notice instability and you find yourself dropping the speed a notch or two. Here again a burn-in test will accelerate this process and you'll see the possible causes of failure in time to resolve them (like using a better heatsink+fan).

    Bottom line: burn-in tests should be performed routinely on overclocked CPU's to ensure they are running error-free and to spot possible chip deterioration before it gets too severe.
  • at first glance (from the title of the post) i thought you were going to ask if software should be burned in, and then i would have to slap you... but you didn't.

    Some good burn-in software... continuously compile your linux kernel, that will keep your CPU pretty busy, and provide errors if something doesn't work.
  • O.k.
    So, where do I find some good burn-in software?

    Thanks,
    Jason Pollock
  • Heh. Thanks, I'll give it a try.
  • This type of processor failure (Where inaccurate results are generated) is much more common that you might want to believe with overclocked processors, overclocked chipsets and overclocked memory.

    Intel wrote off a quarter of a Billion dollars replacing 60-100MHz Pentium Processors a few years back because of the FDIV fuction of the FPU would on certain situations give erronious results.

    Of course, not all tasks take advantage of or require that level of precision. Who cares if your railgun is not 100% accurate at 300M? A bunch of quake heads, that's who! Are these quake heads likely candidates for overclocking their processor/chipset/memory? You betcha!

    Oh well, my 2... -Joe

  • by ansible ( 9585 ) on Monday October 23, 2000 @10:53AM (#683109) Journal

    The purpose of burn-in testing is to find parts that could fail on the customer. It has no effect on stability other than to eliminate faulty parts.

    Most electronic (or at least semiconductor) components follow the "bathtub curve" for failures. This means that components are likely to fail very early after manufacture, or at the end of their operational life (*), with a long stable period in between. You want to try to catch the ones that will fail early before your customer complains about it.

    From what I understand of semiconductor electronics, running the system for a while will not help it stabilize or anything else.

    (*) For older semiconductor parts, their lifetime was 15-20 years. But for stuff produced now, it's about 5. Constant causes the gates to "wear out". The fancy phrase is "electron migration". The metal in the junctions gets eroded away by the current, weaking the gate. High current, and especially high temperatures accelerate this process.

  • So I dunno. It has been said: "Electrons are timid little things, and notional; you have to show them who is in charge." I don't know enough to offer any rational explaination of it; for all I can tell the grooves the bits travel in had to get re-worn... Ya know, like a well banked curve on a back road: when they change the pavement the ruts soon drift to a new spot depending on the new max speed for the curve.

    I don't know either, but the reason I've heard for burn-in is that "the dopants need to stabilize". Whatever that means.

  • I dunno. Not all processor failures cause halt conditions. some failures just provide inaccurate results.

    **NEWS FLASH**
    SETI (Search for ExtraTerristerial Intelligence) reports that signs of intelligent life found in space! Full details to follow shortly

    **NEWS FLASH** SETI (Search for ExtraTerristerial Intelligence) reports that their earlier report of finding signs of intellignet life in space is inaccurate. Apparently, the innacuracy is due to an overclocked processor.
    SETI Officials made the following statement "Ironicly, participation in the SETI@HOME program does not seem to require terristerial intelligence."

    -Joe

  • The original idea of "burning in" a system was to catch infant mortality: if a system is going to fail, it will probably fail during the first few days, especially if it is heavily loaded. While this is useful for determining if an overclocked system is stable, it doesn't increase overclocking potential.

    However, "melt-in" does work. Modern thermal compounds soften when they are heated, resulting in better thermal contact between processor and heatsink. With a good heatsink/fan combination, however, the thermal compound rarely gets warm enough for this to occur. Consequently, running at high voltage, with cpu-intensive code, which generates more heat than would normally be generated, may be necessary.

    An alternative which works equally well is to stop the fan for a few minutes. With less air, the heatsink will heat up sufficiently for the thermal compound to change state.
  • Sounds good, but that is only at "steady state" operation. Don't forget about the effects of thermal cycling, where the circuit heats up and cools down when you turn the computer on and off. This causes the metal bonding wires, and other parts to expand and contract. Some people theorize that this may eventually cause metal fatigue, especially if you're running hot.

    There are actually many more things going on inside, but those are the most significant effects occuring on the chip. Much is heat related as that is the most significant factor in determining the life and operation of the chip. If your environmental temperature goes up, (hot day)the the effects of thermal cycling or even thermal effects will be much more pronounced. Sometimes your chip's temperature will even exceed the design's maximum temperature, which will bring on undefined (unknown, usually lockup) behavior.

    Having said all of that, I think it is a good idea to stress test the chip, if only to find out whether or not that one chip will work when overclocked. Remember, the manufacturer specified a speed rating for a reason, that is they expect the chip to perform as advertised at that speed. Any faster, and you're really on your own. Doesn't really hurt to test it out anyway.
  • I'd agree with everyone else and say that you're wagging the dog, but for the fact that several of the systems I've pushed past their rated limits have taken a few days to become stable.

    One in particular (2x PPro-180 256k running 233) illustrates the effect. It'd seem fine for the first half hour or so, but I kept getting really odd problems after that. Temps were stable and well within limits, RAM etc tested out elsewhere, so I left it running multiple memtest processes on 512MB for about a month. With no configuration changes, just rebooting it when it died. The first few days it crashed, had processes vanish, etc. every few hours, but after a week or so it'd go longer periods without problems.

    It ran the last 12 days or so error free, and has been nearly rock solid (sandstone?) for a year or more now as an everyday workstation. Policy boots keep me from bragging on uptime scores but it's never off. Any problems it's had since have been as easily blamed on multiple causes, and not severe enough to nail down.

    So I dunno. It has been said: "Electrons are timid little things, and notional; you have to show them who is in charge." I don't know enough to offer any rational explaination of it; for all I can tell the grooves the bits travel in had to get re-worn... Ya know, like a well banked curve on a back road: when they change the pavement the ruts soon drift to a new spot depending on the new max speed for the curve.

  • From my observations, burning in simply refers to the fact that chips seem to be able to run at higher stable speeds over time. I have seen this with quite a few of the 60 or so C366@550's I sold and a few personal chips including a P3 and a classic Athlon (750 taken to 1080MHz water-cooled)..

    I've not seen or done any proper experimentation to see what kind of things make it work better/faster, but as it seems to depend a lot on the particular chip anyway, you'd need a pretty large sample set to get any kind of usable information, and the rules might completely change for different types of cores..

    You can speed up the process by locking the CPU at 100% load for a while. I've heard that people think it's mild electromigration (not sure how this would INCREASE stability, though, but I'm no electronics engineer). Some people say higher voltage improves it but from my observations it's not REQUIRED to see the effect. If your chip is more than a month or so old I don't think you'll push the limits much higher.. the effects mostly happened in the first 48 hours or so (of continuous 100% load) for me.

    Then there's the whole thing about cooking your CPU deliberately. I've heard 3 separate instances now where a pelter's cooler has failed and the chip has been raised to some incredibly high temperature.. instead of dying, it actually became BETTER at overclocking once it had cooled and the cooler was fixed. I do not recommend trying this, though..

    So, it may all be related to heat somehow.. the cooking, cpu loading and overvolting all are heat-increasing activities - which may suggest electromigration (or a zillion other things).

    Regardless, this kind of thing almost certainly has an effect on the longevity of your chip, if it's the classic 10-years to 5-years then yeah, who cares, but if it shortens it down to a year or so (not inconceivable for "cooked" chips) it may be more of a worry..

    Anyway, IMHO, burnin seems to be a normal part of the chip's lifecycle. When you first get a chip they sometimes don't overclock very well at all, in fact I had a couple that I'm pretty sure I damaged by trying too high too soon (616@C366) .. they became unstable even at 550 then, when they'd been fine for an hour or so before, I called this "shocking" the chip. After a while they seem to have been run in a little.. [shrug] .. I offer no explanations, only observations.

    Ramble ramble..

    BTW, this has come up a few times in our own forums, you might want to check out this thread [overclockers.com.au] for a recent debate on the topic.. not that you won't get enough opinions on Slashdot. :)

    Agg
    Overclockers Australia - http://www.overclockers.com.au [overclockers.com.au]

  • Some good burn-in software... continuously compile your linux kernel, that will keep your CPU pretty busy, and provide errors if something doesn't work.

    That, and go to SETI@Home's website.

    Their Top 100 list [berkeley.edu] suggests that maybe Sun, Silicon Graphics, Compaq, Intel, IBM, Apple and HP all use SETI at Home for burn-in...

    If the CPU screws up, it's unlikely to create a data unit that gets passed back to SETI and screws up their project; it's more likely that the computer will just spit out a kernel panic or stop responding or something. If the system is otherwise stable, that should be all the error message you need.

    BTW, burn-in is used by builders to prevent shipping D.O.A. systems to customers. As far as I've ever known in all my years with electronics, an extended run won't make any physical changes to the ICs that would make it more reliable. If anything, the heat of being overclocked and run 24/7 is likely to upset the metallurgy of solder joints and the doping of semiconductors more than anything else...

    Bottom line is that you're *proving* that it's reliable, not *making* it reliable.

  • Here's what I can remember from my semiconductor classes back on college. (Note: I'm no longer in the Comp. Engr field, so I'm a little rusty).

    The speed that a processor can run at is dependant on how fast a signal can pass through a stage in the pipeline. The "speed" of that signal, going from high to low, or low to high is based on the equivalent capacitance of the circuit. At the .18 and .25 micron levels, the capacitance is primarily from the wires, not the transistors. That wire capacitance is based on many factors, like how close it is to other wires, good substrate connections, etc. None of these factors should change as the processor runs longer....which means capacitance shouldn't change and your relative stability shouldn't either.

    Now that's from a text book. From personal experience here's what might be happening: the thermal compound (between cpu and heatsink) warms up and becomes more fluid. This fills all the tiny cracks and imperfections which gives better heatsink contact and stability. How's that sound???

  • Yes, with a caveat:

    Have we established that the "burn-in" does anything to the CPU, and not the thermal grease?

    Suppose you've got a few small bubbles in your layer of goop, or the heatsink isn't *quite* perfectly attached. What does burn-in do? Heats stuff, lets it slosh around, and subjects it to several hours of fan vibration.

    End result is a somewhat better-mated heatsink/chip interface and greater stability at marginal (read: overclocked) conditions.

    If my theory is correct, this effect will vary more as a function of die/cap size (e.g. old-sk00l Pentiums, K6-2 and K6-3 chips will show different results than new-sk00l "wow, look at that flip-chip!" designs).

    If not, hey, I'm just blowin' smoke. Problem with that "theory" is, of course, it's essentially untestable, since so many other things have changed besides the "big honkin' mating area" and "little dinky mating area" variables.

    FWIW, I do my thermal testing at high voltages to give myself an idea of the worst-case heat generation during normal fan operation. After a day or two of CPU-"burnin" later, I then overclock, and back off voltage until it's unstable. I then bump the voltage up by 0.05 or thereabouts and reconfirm stability.

    I don't view the burnin as a way to improve the performance of an overclocked CPU, but I do view it as a great way to test stability.

  • This is a great piece I found the Ars-Technica forums some time ago, by "sabine urfer" (whoever that was). I cannot vouch for it other than that his suggestions seem reasonable, and that I have seen many claims on overclocking boards from people who think that burning in a CPU helped its stability.

    begin repost:

    If you overclock your cpu, you are basically running it outside its specification. It may run, but that's not guaranteed. To help it run, you might try burn-in and some other measures.

    How is the cpu basically working?

    A cpu functions with just two different signal states: "high" and "low" (sometimes referred to as "1" and "0" or signal "on" and "off").

    To detect the difference between "high" and "low" the cpu uses some kind of reference voltage. If the voltage of the signal is higher as the reference voltage, the signal is detected as "high", if it is lower, it is detected as "low". The time the cpu has to detect the "high" or "low" states, is limited by the cpu clock. If the clock is set to higher speed, the time for detection, the "decision cycle", becomes shorter.

    The signal itself is not at all digital. Its analog. Which means, it doesn't jump from "low" to "high" and back in no time, it gradually rises from the lower level to the higher and back. This is caused by parasitic capacitances and resistors. They are called "parasitic", because you rather would like them not to be there, but given the current semiconductor manufacturing process, you can't avoid them.

    In the design of the cpu, it is attempted to keep the parasitics as low as possible. Sometimes you run into a quagmire. If you make the resistance lower (for example wider metal lines have less resistance), you might increase the capacitance (wider metal lines have more capacitance). Which means, you will always end up with having the parasitics in one way or another.

    The capacitances "suck" away the rising voltage until the capacitances are charged, the resistances make matters worse by "resisting" the current flow which tries to charge the capacitances. Temperature makes matters also worse, because heat further increases the resistances in the cpu.

    The transistors in the cpu also have some "internal" resistances, if you look at the transfer characteristics, you will see a non-linear behaviour of the current versus the voltage. If you (or the signal, for that matter) increase the voltage, the current will start at zero and rise exponentially from some point on (the so called subthreshold swing), at the so called "threshold" it will become (almost) linear dependent on the voltage and then start to saturate at an certain current level.

    The driving force in the cpu is the supply voltage of the cpu. Setting it to lower values would result in "slacker" subthreshold swings and lower saturation current, setting it to higher values would steepen the subthreshold swing and increase the saturation current, but since you are dissipating more power, you would also generate more heat in return.

    If the combined parasitic and build-in effects limit the signal to the point, where in the given "decision cycle" the cpu can't detect the change from "high" to "low" or vice versa, the cpu will fail. Sometimes it will fail only once in a while, or at specific instructions, it can even fail unnoticed, because the internal error correction will step in. That's what makes figuring out whether the system is stable or not so difficult.

    How to overcome the problem of the failing cpu.

    As stated before, the cpu can fail for some of the following reasons:

    Clock cycles too short. Temperature too high. Voltage too low. Transistors switch too slow.

    Obviously, switching to a lower clock speed is not so desirable when trying to squeeze out the last MHz of performance out of your cpu.

    For fighting too high temperatures, there are several methods, I don't want to discuss this in depth here. Just some basic hints: Use the fattest heatsink, throw sufficient air at it and make sure that heatsink and cpu have good thermal contact.

    The operating voltage was easy to fiddle with in the old days, then motherboards without any voltage setting became popular with the P-II machines. Nowadays, there are again some boards with voltage tweaking capabilities. For increased overclockability, you can, very carefully, try bumping the corevoltage up in 50...100 mV steps. You have to be careful, not to exceed the point where it becomes counterproductive because of the additional heat generation. The power dissipation of a cpu goes nonlinear up with the supplyvoltage. Which means a 5% increase of voltage could lead to 10% increase of power dissipation, a 10% increase of voltage could result in 30% more power dissipation. If the voltage is too high, you can reach the point of brakedown in the transistors, this would shorten the life of your transistor significantly, maybe even to zero.

    If you are wondering, when the burn-in comes to effect, it is in the "transistors switch too slow" point.

    The transistors can made to switch faster with modifications in the semiconductor manufacturing process. This would include scaled down sizes for channel length or gate oxide thickness, optimization in contacts and wiring etc pp. All of which you have no influence in.

    One effect in the actual using of the transistors is the hot-electron-degradation of the gateoxide. Hot electron degradation occurs, when electrons are accelerated to energy levels which allow them to cross the barrier of the gateoxide. The electrons would then either cross the gateoxide completely or get stuck within the gateoxide. A stuck electron would incorporate a negative charge into the gateoxide.

    This degradation starts as soon as the transistor is used and will eventually lead to the failure of it. Usually, the cpus are designed to last almost forever.

    If you can live without that (who wants to use a lame 500 in 20 years anyway?), you can actually make use of the degradation for your overclocking.

    The fun part of this kind of degradation is, that regarding to speed, it makes 50% of the transistors in your cpu a bit worse, but the other 50% would get much better.

    This is because there are two different flavours of transistors in the CMOS process, NMOS and the complementary PMOS. If your gateoxide has incorporated negative charge in it, the NMOS would get a slacker subthreshold swing, the PMOS swing on the contrary, would become steeper. Thus, the PMOS usually being the speedlimiting factor, the cpu at whole, which consists of NMOS and PMOS transistors, would be able to run faster.

    However, the physical effects are not yet understood completely. And that applies not only to me...

    Anyways, you can speed up this degradation process with the burn-in.

    During the burn-in you try to get as much hot electrons incorporated in the gateoxide as possible. The hot-electron effect is sensitive to voltage and temperature. The higher the voltage, the higher the effect, the higher the temperature, the lower the effect. Thus, you would run your cpu at minimum clockrate, maximum voltage and minimum temperature (remember, voltage and temperature are dependent of each other). The time needed to incorporate a sufficient number of electrons varies widely. It depends on the specific cpu and what you expect out of it. Due to manufacturing variances, some cpus may be more susceptible to burn-in than others from a different production run. It may even be different with chips from the same wafer.

    What to do during the burn-in

    Since not every instruction or data will use the whole cpu, you will need to stress your cpu with a wide variety of tasks during the burn-in. If you just let it sit there and idle, only the parts needed for the halt instruction would be stressed...

    You can use several programs to stress your cpu. Usually, a high cpu usage is desired. Programs that can do that would be prime95, rc5des, setiathome; loop-demos of 3dmark99, quake, unreal; endless recompiling of code, etc pp.

    Best, use all of them.

    How long to burn-in and what's next

    After a couple of hours or weeks, depending on what your cpu is capable of, you could try the machine at the desired overclocked speed, with lower voltage. When you're lucky, it'll run smooth. You could test the stability with the same programs you used burning it in.

    If it still hiccups, you may either need further burn-in, or you need to reevaluate other aspects of your machine (cooling, voltage, clockspeed etc).

    Simply put, if a couple of weeks of burn-in didn't help, a couple of months probable won't either.

    If problems persist, you can either go hardcore and try some funny stuff like submerging your computer in mineral oil or get a can of liquid nitrogen to pour over your cpu, or you may have to face the hard truth of overclocking:

    Nothing is guaranteed in overclocking.

    For a comprehensive list of overclocking successes and corresponding voltages, cooling and production dates, visit www.overclockers.com.

  • Mersenne Primes calculations are used by Cray and others to test their supercomputers.

    A implentation of a Mersenne Prime calculator is available at http://www.mersenne.org/prime.htm

    By running mprime, I was able to find out that my cpu fan was not working! The processor was getting hot and mprime was reporting errors; I then opened the case and found out that my CPU fan was dead.
  • I am fully aware of the root cause of the FDIV debacle.

    I was using the most commonly known example of microprocessors yeilding inaccurate results to illustrate how important this can be.

    And the sad thing is, that inaccurate results due to operation outside of specifications goes unnoticed my most.

    -Joe

  • I seem to remember reading somewhere (MaximunPC?)that Pentiums class processors have Error Correction curcuits that correct firmware coding in the CPU package it self, so it is possible that somehow the cpu can adjust to it's operation. In general,concerning semi-conductors,Burn in "pre-stresses" the pathways of the device. I have built electronic gear that has failed (did'nt perform as expected) without a burn in.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...