Power Cycling and Electromigration, and their Effects on Component Failure
Posted 11-23-2007 at 11:21 PM by TheMatt
Those of you who have had a component die have probably wondered what caused it to go bad. Things like hard disks and fans have moving parts, so mechanical failures make sense. Simple friction and heat cause things like head actuators and disk heads to fail after their lifespan is up. Motherboards have capacitors which fail. Conventional electrolytic capacitors, especially when in CPU voltage regulators, are under a lot of stress and can fail or even explode. But what about the true solid state items in your computer, the blobs of silicon, the resistors, and the traces on the PCB? How is it possible that those can fail?
The main cause of components that fail within days of being installed is power cycling. This is the failure of components from rapid expansion and contraction due to heat changes. Many people know that turning a system on and off frequently can lower the lifetime it will run, however the true reason is often not known. Consider a light bulb. Incandescent bulbs most often burn out when they are first turned on. This is because the thermal shock of the increasing temperature causes the filament to break. When a system is powered on, it does not so much sustain an electrical shock that degrades it. When it is powered on, components go from the ambient room temperature to as much as 50 degrees Celsius in a matter of minutes, and even in the first couple of seconds the temperature increases greatly. This creates stress because the components rapidly expand. The opposite is true when a system is turned off. This only tends to cause component failure the first few power cycles though, and these power cycles are known as the burn in period. This is a common problem with low quality power supplies which undergo little or no burn in testing. Components like processors (CPUs) go through a lot of testing in the labs as Intel and AMD clock them higher and higher to see how fast they can run while maintaining stability. As a result, processors rarely fail as a result of power cycling. This is why a system can never run too cool. The cooler the average operating temperature, the less affected a component is by power cycling and heat expansion. As long as companies do through burn in testing before selling their products, power cycling should not affect the end user nearly as much.
The cause of component failure long into the component's lifetime is not discussed as much but nevertheless exists. Electromigration is the degradation of electrical conductors through the intense flow of electrons. Going back to the light bulb example, electromigration can be compared to the wear of the filament over time. This has emerged with the modern high power processors, namely the Prescott Pentium 4. Todays conductors and semiconductors have a lot of current going through them, and this can actually degrade the very thin interconnects in ICs and semiconductors. When electrons flow through conductors at very high rates, they actually interact and collide with imperfections in the interconnects. This creates higher and higher resistance as the interconnect is used and can eventually lead to open circuits or in some cases short circuits in semiconductors. As the electron flow increases, more electrons are scattered and as a result there is more friction, resistance, and heat. This only worsens over time. Unfortunately, there is no real way to help with prevent or reduce the effects of electromigration to a component once it is manufacturered except to limit the use of it. A good way to do this with processors (CPUs) is to use Intel's SpeedStep or AMD's PowerNow! programs which will slow down the processor according to its load. Reducing power consumption in a processor is the key to reducing the effects of electromigration because reducing power consumption will reduce current flow (P = E * I). One way overclocking can damage processors is through electromigration because overclocked processors run at higher frequencies and sometimes higher voltages resulting in a higher thermal design power (TDP = C * F * EČ). Doubling the clock speed of a processor doubles its power consumption, and doubling the voltage quadruples its power consumption. Fortunately processors at stock speeds rarely die and technology is moving fast, so even in modern processors at full load will last long enough that they will be replaced before they die. Electromigration is a problem of the future however, especially with higher and higher power processors.
At this time my recommendation to most all computer users is to turn on the computer in the morning, leave it on through the day, and turn it off for the night. Enable power saving features on all components including the monitor to reduce electromigration effects and do not turn off the computer for things like lunch breaks to eliminate power cycling problems.
The main cause of components that fail within days of being installed is power cycling. This is the failure of components from rapid expansion and contraction due to heat changes. Many people know that turning a system on and off frequently can lower the lifetime it will run, however the true reason is often not known. Consider a light bulb. Incandescent bulbs most often burn out when they are first turned on. This is because the thermal shock of the increasing temperature causes the filament to break. When a system is powered on, it does not so much sustain an electrical shock that degrades it. When it is powered on, components go from the ambient room temperature to as much as 50 degrees Celsius in a matter of minutes, and even in the first couple of seconds the temperature increases greatly. This creates stress because the components rapidly expand. The opposite is true when a system is turned off. This only tends to cause component failure the first few power cycles though, and these power cycles are known as the burn in period. This is a common problem with low quality power supplies which undergo little or no burn in testing. Components like processors (CPUs) go through a lot of testing in the labs as Intel and AMD clock them higher and higher to see how fast they can run while maintaining stability. As a result, processors rarely fail as a result of power cycling. This is why a system can never run too cool. The cooler the average operating temperature, the less affected a component is by power cycling and heat expansion. As long as companies do through burn in testing before selling their products, power cycling should not affect the end user nearly as much.
The cause of component failure long into the component's lifetime is not discussed as much but nevertheless exists. Electromigration is the degradation of electrical conductors through the intense flow of electrons. Going back to the light bulb example, electromigration can be compared to the wear of the filament over time. This has emerged with the modern high power processors, namely the Prescott Pentium 4. Todays conductors and semiconductors have a lot of current going through them, and this can actually degrade the very thin interconnects in ICs and semiconductors. When electrons flow through conductors at very high rates, they actually interact and collide with imperfections in the interconnects. This creates higher and higher resistance as the interconnect is used and can eventually lead to open circuits or in some cases short circuits in semiconductors. As the electron flow increases, more electrons are scattered and as a result there is more friction, resistance, and heat. This only worsens over time. Unfortunately, there is no real way to help with prevent or reduce the effects of electromigration to a component once it is manufacturered except to limit the use of it. A good way to do this with processors (CPUs) is to use Intel's SpeedStep or AMD's PowerNow! programs which will slow down the processor according to its load. Reducing power consumption in a processor is the key to reducing the effects of electromigration because reducing power consumption will reduce current flow (P = E * I). One way overclocking can damage processors is through electromigration because overclocked processors run at higher frequencies and sometimes higher voltages resulting in a higher thermal design power (TDP = C * F * EČ). Doubling the clock speed of a processor doubles its power consumption, and doubling the voltage quadruples its power consumption. Fortunately processors at stock speeds rarely die and technology is moving fast, so even in modern processors at full load will last long enough that they will be replaced before they die. Electromigration is a problem of the future however, especially with higher and higher power processors.
At this time my recommendation to most all computer users is to turn on the computer in the morning, leave it on through the day, and turn it off for the night. Enable power saving features on all components including the monitor to reduce electromigration effects and do not turn off the computer for things like lunch breaks to eliminate power cycling problems.
Total Comments 8
Comments
-
There are several articles on each topic at the IEEE website although you need a membership to view the full article. For example...
http://ieeexplore.ieee.org/xpl/freea...isnumber=17030Posted 11-23-2007 at 11:22 PM by TheMatt
Updated 11-24-2007 at 12:06 AM by TheMatt -
Posted 11-23-2007 at 11:38 PM by TheMatt
Updated 11-24-2007 at 12:08 AM by TheMatt -
Ever thought of becomng a processor engineer? Spending time taking interest in reading these sort of things is what lead me to I became an EE very early in my career.
Posted 11-28-2007 at 05:05 PM by Kalim
-
If I overcome my laziness and do go to grad school for my master's.
I have always been interested in more than just building computers however that is one of my favorite things to do actually. I will probably go for my A+ certification before I start applying for colleges so I can get a good job in college and once I get my bachelor's in Computer Science Engineering or Computer Hardware Engineering (with an Electronics Engineering minor) I will decide where to go.Posted 11-29-2007 at 08:54 PM by TheMatt
Updated 11-29-2007 at 09:14 PM by TheMatt -
Matt, my friend's dad says the A+ is overrated. I took comp maint last year (junior in HS) to get the A+ but didnt bother with the test itself. My friend's dad works for some CIA department in Dallas (he's not allowed to tell) as the computer systems manager, and didn't go to college or get the A+ cert.Posted 12-02-2007 at 06:14 PM by magnethead
-
Another thing it will do though is separate me from everyone else when I send in my college application. Even the first Amateur radio license is extremely easy to get (I saw an eight year-old take the test and pass) but colleges like people who take tests out of school because it shows dedication.
The main thing with the A+ is the price.
Posted 12-02-2007 at 07:48 PM by TheMatt
-
Posted 07-19-2008 at 04:21 AM by stressfreesoul
-
Posted 07-28-2008 at 02:50 PM by ebackhus


















