Counter
|
Explanation |
ATQ Estimated Queue Delay |
How long a request has to wait in the queue |
ATQ Outstanding Queued Requests |
Current number of requests in the queue |
ATQ Request Latency |
Time it takes to process a request |
ATQ Threads LDAP |
The number of threads used by the LDAP server as determined by LDAP policy. |
ATQ Threads Other |
Threads used by other component, in this case the KDC |
ATQ Threads Total |
All Threads currently allocated |
More details on the counters
ATQ Threads Total
This counter tracks the total number of threads from the
ATQ Threads LDAP
and
ATQ Threads Other
counters. The maximum number of threads that a given DC can apply to incoming workloads can be found my multiplying the product of
MaxPoolThreads
times the number of logical CPU cores. MaxPoolThreads defaults to a value of 4 in LDAP Policy and should not be modified without understanding the implications.
When viewing performance logs from a performance challenged DC:
External factors Scenario
|
Symptom and Cause |
Resolution |
Scenario 1
|
LDAP, Kerberos and DC locator responses are slow or time out
|
The problem is described in KB article 2668820 . Install corrective fixes and policy documented in KB 2922852 . |
Scenario 2
|
|
To determine if your clients are using secure LDAP (LDAPs), check the counter "LDAP New SSL Connections/sec".
|
For scenario 2: Depending on the details, there are a few approaches to remove the bottleneck:
LDAP_OPT_SSPI_FLAGS
0x92
Sets or retrieves a ULONG value giving the flags to pass to the SSPI InitializeSecurityContext function.
In System.DirectoryServices.Protocols:
The SspiFlag property specifies the flags to pass to the Security Support Provider Interface (SSPI) InitializeSecurityContext function. For more information about the InitializeSecurityContext function, see the InitializeSecurityContext function topic in the MSDN library
From InitializeSecurityContext:
Schannel must not attempt to supply credentials for the client automatically
ATQ Threads Other
You can also have external dependencies generating requests that hit the Kerberos Key Distribution Center (KDC).
One common operation is getting the list of global and universal groups from a DC that is not a Global Catalog (GC).
A 2
nd
external and potentially intermittent root cause occurs when the
Kerberos Forest Search Order
(KFSO) feature has been enabled on Windows Server 2008 R2 and later KDCs to search trusted forests for SPNs that cannot be located in the local forest.
The worst case scenario occurs when the KDC searches both local and trusted forests for an SPN that can’t be found either because the SPN does not exist or because the search focused on an incorrect SPN.
Memory dumps from in-state KDCs will reveal a number of threads working on Kerberos Service Ticket Requests along with pending RPC calls +to remote domain controllers.
Procdump
triggered by performance counters could also be used to identify the condition if the spikes last long enough to start and capture the related traffic.
More information on KFSO can be found on
TechNet
including performance counters to monitor when using this feature.
ATQ Queues and ATQ Request Latency
The ATQ Queue and latency counters provide statistics as to how requests are being processed. Since the type of requests can differ, the average processing time is typically not significant. An expensive LDAP query that takes minutes to execute can be masked by hundreds of fast LDAP queries or KDC requests.
The main use of these counters is to monitor the wait time in queue and the number of requests in the queue. Any non-zero values indicate that the DC has run out of threads.
Note the performance monitor counters have a timing behavior on the actual time a performance variable is sampled. This is quite a problem when you have a high sample interval. Thus a counter for current queue length such as “ATQ Outstanding Queued Requests” may not be reliable to show the actual degree of server overload.
To work around the averaging problem, you have to take other counters into consideration to better validate confidence in the value. In the event of an actual wait time, there must have been requests sitting in the queue at some point in the last sample interval. The load and processing delay was just not bad enough to have at least one in the queue at the sample time-stamp.
What about other thread pools?
LSASS has a number of other worker threads, e.g. to process IPSec-handshakes. Then of course there is the land of RPC server threads for the various RPC servers. Describing all the RPC servers would take up a number of additional blog entries. You can see them listed as “load generators” in the data collector set results.
A lot of details on LSASS ATQ performance counters, I know. But, geeks love the details.
Cheers,
Herbert
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.