Tech Support banner

Status
Not open for further replies.
1 - 13 of 13 Posts

·
Registered
Joined
·
30 Posts
Discussion Starter #1
System freezes. Right from its initial delivery from IBM, my Intellistation M Pro type 6849 would freeze once roughly every 3 months. It was a complete freeze requiring cycling power down and back up to recover. There were no warnings - no unusual behaviors, no "blue screen of death", no unusual entries in the event log.

This continued for about 3 years then rapidly became much worse. Eventually it began to freeze during the boot process, even during the first phase of the boot process - before Win2000 was in control. I removed all nonessential adapters - still failed. I ran diagnostic programs - DOS mode, no Windows running - when I could get that far, and it would eventually freeze in this mode proving that it was not a Windows problem.

IBM replaced the system board and CPU about 4 months ago. This corrected the problem, sort of. It's back to its original behavior, hanging about every six to eight weeks.

Has anyone else had a similar experience? Any suggestions other than dumping IBM? Thanks in advance.
 

·
Registered
Joined
·
906 Posts
This has become sort of a standard reply to many posts on this forum because Computers have gotten so darn finicky about power so here goes. What about the Power Supply?

Is this a workstation, personal computer or a server? I get the impression by your post that this machine is running 24/7 and after 3 months of running in this mode it freezes.

?
 

·
Registered
Joined
·
30 Posts
Discussion Starter #3
Barry_R said:
This has become sort of a standard reply to many posts on this forum because Computers have gotten so darn finicky about power so here goes. What about the Power Supply?

Is this a workstation, personal computer or a server? I get the impression by your post that this machine is running 24/7 and after 3 months of running in this mode it freezes.

?
Barry:
I guess I'd consider it a workstation. I do run it 24/7 but reboot each weekend.

I doubt that it's a power supply problem because the extreme period (when it would run for only a hour or less before freezing and often would freeze during the boot sequence) was "cured" by replacement of the system board. (Replacement of the CPU alone, which was tried first, did not change the symptoms.) I put "cured" in quotes because the machine was merely restored to its former condition before the problem escalated to its extreme state. And recall that this former condition was one where the machine would run for ten to fourteen weeks before a freeze.

Any further ideas??? And thanks.
 

·
Registered
Joined
·
30 Posts
Discussion Starter #4
slemaker said:
Barry:
I guess I'd consider it a workstation. I do run it 24/7 but reboot each weekend.

I doubt that it's a power supply problem because the extreme period (when it would run for only a hour or less before freezing and often would freeze during the boot sequence) was "cured" by replacement of the system board. (Replacement of the CPU alone, which was tried first, did not change the symptoms.) I put "cured" in quotes because the machine was merely restored to its former condition before the problem escalated to its extreme state. And recall that this former condition was one where the machine would run for ten to fourteen weeks before a freeze.

Any further ideas??? And thanks.
 

·
Registered
Joined
·
906 Posts
Ok I am not going to give up on power just yet.

The failure of the main board after 3 years may be unrelated. However it is possible that the same thing that caused the occasional freezing also led to it's somewhat early demise.

The occasional freezing may be a random event that could be a characteristic of the system or a power issue of some kind. If it is the nature of the system to error every so often then there isn't much you can do about it. But you may be able to do something to clean up the power. Is the machine plugged into a UPS? If not you might want to try that.

The other thought I had is related to the Ram. The Ram is something you never mentioned. If we consider the early demise of the main board as unrelated then maybe the behavior is repeating with the new board because the same memory is still being used. Come to think of it so is the power supply. :grin:
 

·
Registered
Joined
·
30 Posts
Discussion Starter #7
Barry - Thanks for sticking with me on this.

Yes, I am plugged into a UPS. That makes it very unlikely that dirty power from outside is a contributor.

My past experience is that memory and power supply problems generally manifest themselves in randomly variable ways - this is what makes such problems hard to diagnose. My problem, on the other hand, exhibits precisely the same symptom at every occurrence. I have had the freeze occur while I was actively using the system - what I would first notice is that the cursor was failing to respond to mouse movement and, next, that the system was not responding to ANY stimulus from me nor displaying any activity whatsoever. It didn't (doesn't) even respond to pressing the power-off button on the case - until the button is held for 5 seconds or so, presumably triggering a primitive hardware function not involving any CPU or other logic.

Otherwise, the display remains frozen in the state existing at the time of the freeze.

Years ago I worked for Burroughs (now UNISYS) developing diagnostic software for their main-frame computer. The system had a control flip-flop ("toggle") named "run" and its state was ANDed with the output of the system clock-pulse generator. The output of the AND gate was then fed to the clock bus. If you reset the run toggle to "false", no clock pulses were fed to the clock bus and all synchronous operations instantly ceased.

My system is behaving as if the P4 chip set has such a run toggle (I don't know if it actually does or not) and something is occasionally resetting it thus freezing all synchronous operations. I have an LED display capable of either digital or analog operation being driven by an analog video adapter. Presumably the video memory is in the adapter and communication between the adapter and the display is asynchronous. Thus, ceasing synchronous operations would not affect the display.

The serious phase of my problem, beginning last December, exhibited exactly the same symptoms, only more frequent. Freezes began to occur every couple of days, then once a day, then after 15 minutes to a couple of hours. I found that if I left power off for a day, it would then USUALLY run for two or three hours before freezing. This made me suspect a thermal problem, but I have - I believe - pretty much ruled that out. The fans are all operating and I have checked interior temp by inserting a lab thermometer through the front grille into what is a hot spot - the narrow space between two hard disks in adjacent bays. Not only is this temp fine, but a thermal problem would not disappear instantly upon powering down - in fact, temperature would probably RISE for a short while after turning power off. When the freeze occurs, I power off, wait only long enough for the disks to spin down and stop, then power back up and the system then runs flawlessly for weeks.

Whatever the problem is, it's probably due to something which is operating marginally and the marginal nature gradually deteriorates over time. On the one hand, it would seem to be a design problem because my original system and the new board and CPU both exhibited this problem right from the beginning. But on the other hand, I have so far heard of no other users experiencing the same problem.

Thanks again, Barry, and I remain open to any other ideas or suggestions you might have.
 

·
Registered
Joined
·
906 Posts
Power managment event is the only thing I can think of at the moment, in the BIOS and the OS. You said it has done this when you are actively using the system but I would give it a shot anyway. Turn them off for a while.
 

·
Registered
Joined
·
18,118 Posts
How about the source of the power? If you have "dirty" juice going in then of course you'll have problems, even with new parts. You may want to consider having a certified electrician come and do a line noise test.
 

·
Registered
Joined
·
30 Posts
Discussion Starter #10
To Barry: I don't believe any power management function is in effect. Also, I'm unaware of any power-level change when a freeze occurs - disks continue to spin and all power and other indicater LEDs remain lit and in their normal state. I will double check tomorrow however.

To: E Backhus: Hi, & thanks for the response. My line power is being filtered through an APC brand UPS and I have no reason to doubt that it is doing its job. It has on a very few occasions switched momentarily to battery in response to a spike or brief voltage drop and it periodically switches as part of a self-test. The computer has never reacted in any way to these events other than to record in the event log the notice sent from the UPS. Also, I never had any similar problem during the years I had a different PC; the problem began with installation of the IBM Intellistation.

Any other suggestions? What I would really like to find is someone who could tell me if my hypothesis about a "run" toggle is correct, or what other single component could effectively stop the system clock. I'd also like to know where depressing the power-off button is sensed and handled - in firmware somewhere, perhaps, or in the BIOS? If so, this would seem to confirm my hypothesis that something has stopped the system clock
 

·
Registered
Joined
·
906 Posts
You never mentioned anything about your Ram. I mentioned it before but it is I believe the one constant other than the PSU in your current system. If you have ram timing control maybe a look at those settings and possibly relaxing the timing might help.

I use prime95 to test my system stability when I do overclocking. Some use memtest86 to test their ram. If the machine is unstable it will eventually fail the prime95 torture test at some point.

Both of these are readily found with a google search.
 

·
Registered
Joined
·
30 Posts
Discussion Starter #12
To Barry & EBackHus

You may be interested in what I've found. I've not yet tested this, but I'm almost certain the following is the problem.

The following link points to a fascinating article about thermal problems of fast CPUs.

http://www.hardwareanalysis.com/content/article/1278.1/

I had ruled out a thermal problem because the air temp inside my case was OK and also because I could restart immediately after shutting down. But this article points out that the actual temperature of the CPU die can change VERY fast, as fast as 50-degrees per second (both up AND down). So case interior temp is no guiide to die temp.

There is on-die circuitry to measure temp almost instantly and, if a certain threshhold is exceeded anywhere on the die, the CPU is instantly turned off (I assume this is done by setting a signal called, guess what? STPCLK)

When the CPU is stopped, temp drops immediately and is probably already well below the safe threshhold by the time I shut power off; so obviously it restarts OK.

Finally, while searching the Net for info on the subject of heatsinks, I ran across an IBM article on heatsink problems with P4 processors in IBM systems. It seems that it is VERY important to see that the thermal paste applied between the CPU and the heatsink forms a film which is thin and uniform, without voids. I remember that when the IBM tech replaced the CPU in a first attempt to cure my problem, the old paste had hardened and cracked off in bits. I doubt that he cleaned the surfaces thoroughly before applying the new paste - Web sites about thermal pastes emphasize the importance of THOROUGHLY cleaning both surfaces before applying the new paste. Also, special pastes are available which contain a colloidal form of metallic silver to provide VERY good thermal conductance.
 

·
Registered
Joined
·
906 Posts
Ah yes there is the heat issue, whats in the case is not always indicative of what is in the CPU, although there is a relationship. Maybe you got a Hot chip and your heatsink just cant cut it.

I have a Thermalright XP-90 heatsink with a 92mm Zalman fan. I use Arctic Silver 5 as a thermal paste. I used Goofoff to clean the CPU and heatsink then applied the AS5 according to manufacturers instructions. My reported CPU temps are about case temp when Idle and 100 to 102F under full load.
 
1 - 13 of 13 Posts
Status
Not open for further replies.
Top