First published on TechNet on Aug 01, 2016
This blog post brought to you by eighteen year veteran Microsoft Premier Field Engineer David Morgan.
Goal of this Post
Over the years my customers have asked about what they should do first when they get a trouble ticket for a misbehaving Windows failover cluster. There are some fairly simple steps one can take first that can provide a host of benefits during the troubleshooting process like:
- Faster problem resolution
- A successful and faster root cause analysis
- Faster service response times from vendor support personnel
- Data about the event and the surrounding environment helpful in post mortems that can help prevent the same, and other, problems in the future
- And more
This particular post isn't about doing actual troubleshooting. Here I'm only going to go into the primary steps one should take before undertaking in-depth troubleshooting activities. Actual troubleshooting scenarios and details will follow in future posts where you'll see why having captured these resources in the beginning can make your IT life a bit better. Summary
- Immediately Capture all Cluster Logs
- Write a Very Detailed Description of the Problem
- Capture Microsoft Cluster Diagnostics Outputs
- Create a Cluster Validation Report
Detail
- The most important task – immediately gather the cluster logs from all nodes.
If this is not done within ~72 hours (varies) the data logged about your problem event will be overwritten when the log wraps. In almost all cases if the cluster log is not available for the time of the event a reliable root cause cannot be provided.
-
At this time consider setting the cluster log level higher to gain more insight to the issue if it reoccurs:
-
As soon as possible collect the following diagnostic results from the cluster.
-
- You will need to log in using a Microsoft account such as live.com, outlook.com, Hotmail.com, etc.
- Once you are logged in enter "Failover Cluster" in the search field.
- Your search results should provide a link to the Windows FailoverCluster Diagnostic .
- Click on the link Windows FailoverCluster Diagnostic and choose create.
- Next, choose Download and save the file to some location or you can choose Run.
-
After executing the download choose:
- Run now on this PC if the desktop you are on is one of the cluster nodes.
- Save to run later on another PC if the desktop you are on is not a cluster node.
-
After executing the diagnostics package, you will be taken to a screen allowing you to select which nodes you wish to collect information from.
-
It is best to have diagnostics for all the nodes in the cluster. However, there may be reasons for you to choose only a subset and run the diagnostics tool more than once with different nodes in the collection.
-
The primary reason for this is that the tool will compress no more than 2GB of collected data. With very large clusters, it is easy to reach or surpass this threshold. If you run the tool against a large number of nodes, the collection will be finished when you see the screen titled "Review the diagnostic results before you send the item" appear. Before choosing next and compressing the data, it would be prudent to check the temporary location where the captured files are located first and determine the total size. If it is greater than 2GB it would be prudent to copy all the files to another location as when the tool fails because of the size limitation the temp location files are deleted.
The temporary file location is:
- %WINDIR%\TEMP\SDIAG_{GUID} (where GUID represents a diagnostic execution)
- Next choose a location to save the diagnostics output
- A folder named Upload Results will be created that contains a compressed file with a .cab extension. Save the file Results….cab and delete the remaining files in the folder.
-
If you run into other issues this FAQ is extensive:
- 2598970 Information about the Microsoft Automated Troubleshooting Services and Support Diagnostic Platform
-
Capture a Failover Cluster Validation Report
Finally , store all these files in case you later work with a Microsoft support engineer. You'll be amazed at how much faster your support call can go if you already have this data collected and ready to upload to your support vendor.