Note: a major update to this post (newer links, tools etc.) was completed on 2/5/2020.
Some of relatively common and difficult issues we see in support are related to client connectivity to Exchange. There are several variations that we classify as connectivity (related to server performance or otherwise). They can include:
There are many factors that can contribute to these symptoms, and each one can lead down a completely different troubleshooting path. In this post we will focus on methods and tools to troubleshoot these. There are too many potential underlying causes to cover them all but hopefully this will serve as a good starting point for troubleshooting.
This post is not meant to be a post that you necessarily read from start to finish, but rather serve as a guidance for when you need to troubleshoot hard to find issues related to client connectivity. It can be a bit overwhelming, sure, but depending on the depth that you need to get into – it might be all worth it!
In every case, it's best to spend some time understanding what the user is experiencing. Understanding the full client experience and effect can help determine the logging and troubleshooting path. When you define the users issue, you should also run the, Office 365 Support and Recovery Assistant (aka SaRA) on the client to have a good view of configuration that can impact their experience. To obtain the configuration information, select “Advanced Diagnostics” click “Next” and chose “Outlook” and follow the prompts to obtain the report.
Note: If you are an Office 365 customer with users having mailboxes in Exchange Online, we highly recommend using Office 365 Support and Recovery Assistant to troubleshoot Outlook connectivity and other common issues to Office 365 since this post is for On-Premises.
Here are some items to consider on the client side when troubleshooting these issues.
The first step for the Exchange server is to verify that the preferred architecture is in place by reviewing the related version documentation.
Several items such as setting power management to “High Performance” and verify the OS isn't turning off power to the NIC. Making sure the server is running updated and supported .NET version, and so on are extremely important:
Exchange TCP KeepAliveTime should be set to 30 minutes or no less than 15 minutes. If there's no entry in the registry for KeepAliveTime then the value is 2 hours. This value, if not set correctly, can affect both connectivity and performance. You must make sure that the load balancer and any other devices in the path from client to Exchange be set correctly. More information on load balancer configuration in the Load Balancing Configuration section below. The goal is to set Exchange with the lowest value so that client sessions when ended, are ended by the Exchange and not by a device.
Path: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters\ Value name: KeepAliveTime Value Type: 1800000 (30 minutes, in milliseconds) (Decimal)
Please note that for 3rd party load balancer configuration, you should always refer to product documentation / guidance. The following are some general best practices and things we see misconfigured:
Clients > Firewall > Load Balancer > Exchange
In this example, if you have a firewall in the path from client to Exchange, we are referencing the firewall “idle” time out and not the persistence time out. This value should be greater than the load balancer and the load balancer time out should be greater than Exchange. Note that it is not recommended to go below 15 minutes for Keep Alive on Exchange or TCP idle timeout on the load balancer.
Firewall time out = 40 minutes
LB TCP Idle time out = 35 minutes
Exchange Keep Alive = 30 minutes
If the load balancer supports it, the preferred option is to configure it to use “Least Connections” with “Slow start” during typical operation.
With the "least connections" method, be mindful it's possible for a server to become overloaded and unresponsive during a server outage or during patching maintenance. In the context of Exchange performance, authentication is an expensive operation.
If you have completed the previous steps and you are still experiencing issues, the following data is necessary:
For testing, a DC can be excluded by using the Set-Exchangeserver –StaticExcludedDomainControllers parameter as shown in this section, however, troubleshooting Global Catalog access should also be done as soon as your testing is completed. Statically excluding a GC takes effect immediately and will be viewable on the next 2080 Event ID with all zero values. Some additional resources on the subject:
The PDC column should show 0 and not be used by Exchange
Also check for Event ID 2070 or 2095. A 2070 event occurs when Exchange tries to query a DC and fails, and it cannot contact/communicate with a DC. If this event occurs during the time when clients have issues (or frequently), then it should be investigated. If you only see this event occasionally over several days, it could be a result of the DC being restarted during maintenance. The same is true with event 2095; infrequent isn't a concern but continued logging of this event could be a sign of a problem.
Always ensure MaxConcurrentApi bottlenecks are not present in the environment. To avoid this problem now or in future review the following information: You are intermittently prompted for credentials or experience time-outs when you connect to Authenti...
LDAP latencies can impact server and client performance. When you work with client connectivity issues, the LDAP counter can help point delays with communications to the DC’s. These are under the MSExchange ADAccess Domain Controllers(*) LDAP Read Time and LDAP Search Time and is recommended that the average be within 50ms and spikes no greater than 100ms. More information is located here.
Some logging and tools to help troubleshoot issues are listed below.
A great option for collection of logs if you need to submit them to support teams or just review for yourself, is the “Exchange Log Collector Script”. This script allows you to collect a wide range of logs that is by default enabled on the server that is needed to investigate an issue. You can customize what logs the script needs to collect by using the switch parameters provided within the script. If you don’t know what to collect the switch –AllPossibleLogs is a good way to catch all the default logging. For more information review the Exchange Log Collector Script’s GitHub page.
In Exchange, there are several logs in the logging folder. For Outlook clients one of the first logs to examine are the HTTP Proxy logs on Exchange. The connection walk-through section shows the process that is used to connect to Exchange 2013. This complete process is logged in the HTTP Proxy log. Also, if it is possible, add Hosts file to the client for one specific Exchange to reduce the number of logs.
The RPCHTTP logs on Exchange are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\RpcHttp
The MAPI logs are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\Mapi
Exchange has HTTP Proxy logs for AutoDiscover that are similar to the logs shown earlier that can be used to determine whether AutoDiscover is failing.
The logs on Exchange are located here by default: C:\Program Files\Microsoft\Exchange Server\V15\Logging\HttpProxy\AutoDiscover
HTTP Error logs are failures that occur with HTTP.SYS before hitting IIS. However, not all errors for connections to web sites and app pools are seen in the httperr log. For example, if ASP.NET threw the error it may not be logged in the HTTP Error log. By default, HTTP error logs are in C:\Windows\System32\LogFiles\HTTPERR. Information on the httperr log and codes can be found here.
IIS logs can be used to review the connection for RPC/HTTP, MAPI/HTTP, EWS, OAB, and AutoDiscover. The full data for the MAPI/HTTP and RPC/HTTP is not always put in the IIS logs. Therefore, there is a possibility that the 200 connection successful may not be seen. This link is useful to identify the IIS codes.
In Exchange IIS logs should contain all user connections on port 443. IIS logs on the Mailbox server should only contain connections from the Exchange server on port 444.
Most HTTP connections are first sent anonymously which results in a 401-challenge response. This response includes the authentication types available in the response header. The client should then try to connect again by using one of these authentication methods. Therefore, a 401-status found inside an IIS log does not necessarily indicate an error.
Note that an anonymous request is expected to show a 401 response. You can identify anonymous requests because the domain\username is not listed in the request.
For Outlook Anywhere sessions, the RCA logs can be used to find when a user has made a connection to their mailbox, or a connection to an alternate mailbox, errors that occur with the connection, and more information. RCA logs are in the logging directory which is located at %ExchangeInstallPath%\Logging\RpcClientAccess. By default, these logs have a maximum size of 10MB and roll over when size limit is reached or at the end of the day (based on GMT), and the server keeps 1GB in the log directory.
ETL logs are in %temp%/Outlook Logging and are named Outlook-#####.ETL. The numbers are randomly generated by the system.
To enable Outlook logging
How to enable Outlook logging in the registry:
Note: xx.0 is a placeholder for your version of Office. 16.0 = Office 2016, Office. 15.0 = Office 2013
You can use Perfmon for issues that you suspect are caused by performance. See Perfwiz. Example command line: \experfwiz.ps1 -server MBXServer -interval 5 -filepath D:\Logs.
Exchange has daily performance logs that captures the majority of what is needed however, for most troubleshooting we will require Perfwiz. These logs are by default located in C:\Program Files\Microsoft\Exchange Server\V16\Logging\Diagnostics\DailyPerformanceLogs
Log Parser Studio is a GUI for Log Parser 2.2. LPS greatly reduces complexity when parsing logs. Additionally, it can parse many kinds of logs including IIS Logs, HTTPErr Logs, Event Logs (both live and EVT/EVTX/CSV), all Exchange protocol logs, any text based logs, CSV logs and ExTRA traces that were converted to CSV logs. LPS can parse many GB of logs concurrently (we have tested with total log sizes of >60GB).
We use this tool to get detailed information about client traffic.
Thanks to David Paulson, Jim Martin, Nino Bilic, Bhalchandra Atre and Rob Whaley for technical review.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.