Ah, there’s nothing like the stop-everything, our-company-has-come-to-a-complete-halt emergency call we sometimes get where the domain controllers have slowed to a figurative crawl. Resulting in nearly all other business likewise emulating a glacier as well owing to logon and application failures and the like.
If you’ve had that happen to one of your domain controllers then you are nodding your head now and feeling some relief that you are reading about it and not experiencing that issue right this moment.
The question for this post is: what do you do when that waking nightmare happens (other than consider where you can hide where your boss can’t find you)?
Well, you use my favorite, and the guest of honor for this post: Server Performance Advisor. Otherwise known as SPA.
Think of SPA as a distilled and concentrated version of the Perfmon data you might review in this scenario. Answers to your questions are boiled down to what you need to know; things that are not relevant to Active Directory performance aren’t gathered, collated or mentioned. SPA may not tell you the cause of the problem in every case, but it will tell you where to look to find that cause.
So I’ve talked about the generalities of SPA, now let’s delve into the specifics. Well, not all of them, but an overview and the highlights that will be most useful to you.
SPA’s AD data collector is comprised of sections called Performance Advice, Active Directory, Application Tables, CPU, Network, Disk, Memory, Tuning Parameters, and General Information.
Before you reach all of the hard data in those sections, though, SPA gives you a summary at the top of the report. It’ll look something like this:
Summary
|
||||
|
||||
|
||||
|
||||
|
||||
|
Performance Advice is pretty self explanatory and is one of the big benefits of SPA over other performance data tools. It’s a synopsis of the more common bottlenecks that can be found with an assessment of whether they are a problem in your case. Very helpful. It looks at CPU, Network, Memory and Disk I/O and gives a percentage of overall utilization, it’s judgment on whether the performance seen is idle, normal or a problem and a short detail sentence that may tell more.
The Active Directory portion gives good collated data and some hard numbers on AD specific counters. These are most useful if you already have an understanding of what that domain controllers baseline performance counters are. In other words, what the normal numbers would be for that domain controller based on what role it has and services it provides day to day. Generally speaking, though, SPA is most often used when a sudden problem has occurred, and so at that point establishing a baseline is not what it should be used for.
The good collated data includes a listing of clients with the most CPU usage for LDAP searches. Client names are resolved by FQDN and there is a separate are that gives the result of those searches.
AD has indices for fast searches and those indices can get hammered sometimes. The Application Tables section gives data on how those indices are used. The information this gives to you can be used to refine queries being issued to the database (if they were to traverse too many entries to get you a result for example) if you have an application that is doing that sort of thing, it can suggest that you need to index something new, or that you need to examine and perhaps fix your database using ntdsutil.exe.
The CPU portion gives a good snapshot of the busiest processes running on the server during the data gathering. Typically, this would show LSASS.EXE as being the busiest on a domain controller, but not always-particularly in situations where the domain controller has multiple jobs (file server, application server of some kind perhaps). Generally speaking, having a domain controller be just a domain controller is a good thing.
Note: If Idle has the highest CPU percentage then you may want to make sure you gathered data during the problem actually occurring.
The Network section is one of the most commonly useful ones. Among other things, this summarizes the TCP and UDCP client inbound and outbound traffic by computer. It also tells what processes on the local server were being used in conjunction with that traffic. Good stuff which can give a “smoking gun” for some issues. The remaining data in the Network section is also useful but we have to draw the line somewhere or this becomes less of a blog post and more like training.
The Disk and Memory sections will provide very useful data, more so if you have that baseline for that system to tell you what is out of the normal for it typically.
SPA is a free download from our site, and installs as a new program group. Here’s where you can get it (install does not require a reboot):
A few other things to discuss regarding SPA.
· It requires Server 2003 to run.
· As I stated above, when you have a problem is the worst time to establish a baseline
· The duration of the test can be altered depending on your issue. The default time set for it is 300 seconds (5 minutes). Keep in mind that if you gather data a great deal longer than the duration of the problem then you run the risk of averaging out the data and making it not useful for troubleshooting.
· In the same way that there are ADAM performance counters, SPA has an ADAM data collector
· The latest version (above) includes an executable that can kick this off from a command line, and which can be run remotely via PsExec or similar.
· Server 2008 Perfmon will include SPA like data collectors…or so I hear.
· SPA will not necessarily be the only thing you do, but it’s a great starting place to figure out the problem.
See? A day at the SPA can really take the edge off of a stop-everything, our-company-h as-come-to-a-complete-halt emergency kind of day. Very relaxing indeed.