Computer Hung/Unresponsive
(Pre-Windows Server 2008)
Description: A hang is typically defined as a condition where a machine is non-responsive over the network and\or at the console. This usually manifests itself in not being able to log onto the console or a session, or a session becoming unresponsive to input or network traffic. This is not to be confused with a crash or bugcheck, which indicates a software or kernel fault. This document is specific to instances where a machine hangs or becomes unresponsive during normal use. This does not apply to these symptoms (they are covered elsewhere):
Server hang during boot
Server hang after CTRL-ALT-DEL
Server hang at Applying Computer Settings
Server hang at Shutdown
This document applies to:
Windows 2000 Service Pack 4 with Update Rollup Package 1. (Mainstream support ended
6/30/2005)
Windows Server 2003 RTM (Mainstream support ended 3/30/2007)
Windows Server 2003 Service Pack 1 (Mainstream support ended 4/14/2009)
Windows Server 2003 Service Pack 2 (Mainstream support ends 7/13/2010)
Note: http://support.microsoft.com/gp/lifeselect
Scoping the Issue: Define the type of hang:
1. Is the console hung or is it an issue with network connectivity?
2. Does Ctrl-Alt-Delete bring up the Windows Security dialog?
3. Can you toggle Caps Lock or Num Lock? If you can’t it could be a hardware or driver problem.
4. Can you move the mouse?
5. Is there a KVM in use?
6. When did the issue start occurring? DDMMYYYY, HH:MM:SS
7. What changed?
8. How long has the server being in production?
9. How often does the issue occur?
10. Under what conditions does the issue occur?
11. What else is going on when the issue occurs?
12. Does it happen at a particular time of day (users logging in, scheduled tasks, backup etc).
13. Is there anything you can do to make the problem occur (repro steps)?
14. Can you ping by Ip address, Netbios or Fully Qualified Domain Name?
15. Can you open network shares? Can users connect to file shares on the hung machine? Are there any errors?
16. Are you able to logon at the physical console? If so, are there any errors?
17. Are you able to logon at via Remote Desktop (RDP client)? Are there any errors?
If this is a terminal server, are you observing this behavior from a session or at the console?
18. Are you able to open Computer Management remotely? Are there any errors?
19. What do you do to recover from the hang?
20. How long have you waited before rebooting the server?
21. What have you tried to do to fix the problem?
22. If it’s not completely hung and we can get to Task Manager, check resources:
CPU time - is there a specific process pegging the CPU?
If so and its third party, if we end it what happens?
Data Gathering: One of the most useful tools in diagnosing system hangs is Performance Monitor (Perfmon) logging. Perfmon allows the user to gather performance counters for various objects relating to system health, such as: Memory, Network Interface, Physical Disk, Processor, Process, etc.
In all instances, collect:
1. MPS Reports PFE version
Microsoft Premier Services Reporting Utility (PFE version)
2. Perfmon logs should include the timeframe when the problem is happening on the system.
You can create the log parameters manually , or by using the Performance Monitor Wizard .
You should capture the logs remotely from another computer.
a. Set up the remote Binary Circular performance log grab all core OS counters
· Cache
· Logical disk
· Memory
· NBT Connections
· Network interface
· Objects
· Paging File
· Physical disk
· Process
· Processor
· Redirector
· Server
· Server Work Queues
· System
The Perfmon capture interval is determined by the length of time it takes the server to go from a normal state, to a problem state.
Please gather two concurrent Perfmon logs:
b. Short interval with a 5 seconds interval.
If the average time to issue is: |
The capture interval should be: |
Hourly |
5 seconds |
And
c. Long interval
Please use the table below to set the capture interval.
If the average time to issue is: |
The capture interval should be: |
Daily |
160 seconds |
3 days |
360 seconds |
1 week |
800 seconds |
2 weeks |
1600 seconds |
3 weeks |
2400 seconds |
Monthly |
7200onds |
d. In Windows 2000, a common problem encountered when attempting to collect Perfmon logs remotely is that by default, the Performance Logs and Alerts service is started under the local computer’s “System” account. For steps on how to enable a network account to have permissions on the Performance Logs and Alerts service, please refer to Microsoft KB Article 240389: Log is not started when you try to start a log with remote counters in System Monitor .
e. In Windows Server 2003, you can simply use the "RunAs" option when setting up the counters.
3. Setup for a complete memory dump per KB 972110 .
Proactively, make sure that :
--------------------------------------
- Check with the OEM vendor for any known issues with their hardware or updates.
- Update the bios
- Update the drivers and firmware from the OEM server hardware vendor website.
- Update the remote management software i.e. iLO/DAC
- Update the HBA driver and firmware
- Update the Storage driver and firmware
- Verify that software drivers are up to date. This includes antivirus, quota management software, remote management software, etc.
- Verify that Windows security and reliability updates are up to date.
Troubleshooting / Resolution:
1. In the "System Event Log" look for "Event ID 2019" and "Event ID 2020"
2. In Perfmon, check for any Process --> NameofProcess --> Handles value larger than 15,000.
Note: LSASS.exe on DC's is normal to see a value up to 50,000.
Note: Store.exe on Exchange servers is normal to see a value up to 65,000
Additional Resources:
972110 How to generate a kernel dump file or a complete memory dump file in Windows Server 2003
http://support.microsoft.com/?id=972110
177415 How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks
http://support.microsoft.com/kb/177415
PoolMon Examples
http://msdn.microsoft.com/en-us/library/ms792885.aspx
Poolmon Overview
http://technet.microsoft.com/en-us/library/cc737099(WS.10).aspx
164933 How to allow Poolmon.exe to run by setting GlobalFlag value
http://support.microsoft.com/kb/164933
Using PoolMon to Find a Kernel-Mode Memory Leak
http://msdn.microsoft.com/en-us/library/cc267829.aspx
246758 How to Monitor Performance of a Remote Computer Without Logging on to It
http://support.microsoft.com/id=246758
969639 Error message when you try to access the Performance Monitor (Perfmon.exe) on a remote computer: "Access Is Denied"
Http://support.microsoft.com/?id=969639
888989 A Performance Monitor counter for the Physical Disk performance object may not be displayed in Windows 2000
Http://support.microsoft.com/?id=888989
248993 PRB: Performance Object Is Not Displayed in Performance Monitor
http://support.microsoft.com/?id=248993