Load Balancing in Exchange 2013
Published Mar 05 2014 06:00 AM 161K Views
Microsoft

This article is part 2 in a series that discusses namespace planning, load balancing principles, client connectivity, and certificate planning.

Load Balancing

Unlike previous versions of Exchange, Exchange 2013 no longer requires session affinity at the load balancing layer.

To understand this statement better, and see how this impacts your designs, we need to look at how CAS2013 functions. From a protocol perspective, the following will happen:

  1. A client resolves the namespace to a load balanced virtual IP address.
  2. The load balancer assigns the session to a CAS member in the load balanced pool.
  3. CAS authenticates the request and performs a service discovery by accessing Active Directory to retrieve the following information:
    1. Mailbox version (for this discussion, we will assume an Exchange 2013 mailbox)
    2. Mailbox location information (e.g., database information, ExternalURL values, etc.)
  4. CAS makes a decision on whether to proxy the request or redirect the request to another CAS infrastructure (within the same forest).
  5. CAS queries an Active Manager instance that is responsible for the database to determine which Mailbox server is hosting the active copy.
  6. CAS proxies the request to the Mailbox server hosting the active copy.

Step 5 is the fundamental change that enables the removal of session affinity at the load balancer. For a given protocol session, CAS now maintains a 1:1 relationship with the Mailbox server hosting the user’s data. In the event that the active database copy is moved to a different Mailbox server, CAS closes the sessions to the previous server and establishes sessions to the new server. This means that all sessions, regardless of their origination point (i.e., CAS members in the load balanced array), end up at the same place, the Mailbox server hosting the active database copy.This is vastly different from previous releases – in Exchange 2010, if all requests from a specific client did not go to the same endpoint, the user experience was negatively affected.

The protocol used in step 6 depends on the protocol used to connect to CAS. If the client leverages the HTTP protocol, then the protocol used between the Client Access server and Mailbox server is HTTP (secured via SSL using a self-signed certificate). If the protocol leveraged by the client is IMAP or POP, then the protocol used between the Client Access server and Mailbox server is IMAP or POP.

Telephony requests are unique, however. Instead of proxying the request at step 6, CAS will redirect the request to the Mailbox server hosting the active copy of the user’s database, as the telephony devices support redirection and need to establish their SIP and RTP sessions directly with the Unified Messaging components on the Mailbox server.


Figure 1: Exchange 2013 Client Access Protocol Architecture

However, there is a concern with this architectural change. Since session affinity is not used by the load balancer, this means that the load balancer has no knowledge of the target URL or request content. All the load balancer uses is layer 4 information, the IP address and the protocol/port (TCP 443):

Layer 4 Load Balancing
Figure 2: Layer 4 Load Balancing

The load balancer can use a variety of means to select the target server from the load balanced pool, such as, round-robin (each inbound connection goes to the next target server in the circular list) or least-connection (load balancer sends each new connection to the server that has the fewest established connections at that time).

Health Probe Checking

Unfortunately, this lack of knowledge around target URL (or the content of the request), introduces complexities around health probes.

Exchange 2013 includes a built-in monitoring solution, known as Managed Availability. Managed Availability includes an offline responder. When the offline responder is invoked, the affected protocol (or server) is removed from service. To ensure that load balancers do not route traffic to a Client Access server that Managed Availability has marked as offline, load balancer health probes must be configured to check <virtualdirectory>/healthcheck.htm (e.g., https://mail.contoso.com/owa/healthcheck.htm). Note that healthcheck.htm does not actually exist within the virtual directories; it is generated in-memory based on the component state of the protocol in question.

If the load balancer health probe receives a 200 status response, then the protocol is up; if the load balancer receives a different status code, then Managed Availability has marked that protocol instance down on the Client Access server. As a result, the load balancer should also consider that end point down and remove the Client Access server from the applicable load balancing pool.

Administrators can also manually take a protocol offline for maintenance, thereby removing it from the applicable load balancing pool. For example, to take the OWA proxy protocol on a Client Access server out of rotation, you would execute the following command:

Set-ServerComponentState <Client Access Server> -Component OwaProxy –Requestor Maintenance –State Inactive

For more information on server component states, see the article Server Component States in Exchange 2013.

What if the load balancer health probe did not monitor healthcheck.htm?

If the load balancer did not utilize the healthcheck.htm in its health probe, then the load balancer would have no knowledge of Managed Availability’s removal of (or adding back) a server from the applicable load balancing pool. The end result is that the load balancer would have one view of the world, while Managed Availability would have another view of the world. In this situation, the load balancer could direct requests to a Client Access server that Managed Availability has marked down, which would result in a negative (or broken) user experience. This is why the recommendation exists to utilize healthcheck.htm in the load balancing health probes.

Namespace and Affinity Scenarios

Now that we understand how health checks are performed, let’s look at four scenarios:

  1. Single Namespace / Layer 4 (No Session Affinity)
  2. Single Namespace / Layer 7 (No Session Affinity)
  3. Single Namespace / Session Affinity
  4. Multiple Namespaces / No Session Affinity

Single Namespace / Layer 4 (No Session Affinity)

In this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is operating at layer 4 and is not maintaining session affinity. The load balancer is also configured to check the health of the target Client Access servers in the load balancing pool; however, because this is a layer 4 solution, the load balancer is configured to check the health of only a single virtual directory (as it cannot distinguish OWA requests from RPC requests). Administrators will have to choose which virtual directory they want to target for the health probe; you will want to choose a virtual directory that is heavily used. For example, if the majority of your users utilize OWA, then targeting the OWA virtual directory in the health probe is appropriate.

1Namespace-NoAffinity-1
Figure 3: Single Namespace with No Session Affinity

As long as the OWA health probe response is healthy, the load balancer will keep the target CAS in the load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target CAS from the load balancing pool for all requests associated with that particular namespace. In other words, in this example, health from the perspective of the load balancer, is per-server, not per-protocol, for the given namespace. This means that if the health probe fails, all client requests will have to be directed to another server, regardless of protocol.

1Namespace-NoAffinity-2
Figure 4: Single Namespace with No Session Affinity - Health Probe Failure

Single Namespace / Layer 7 (No Session Affinity)

In this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is configured to utilize layer 7, meaning SSL termination occurs and the load balancer knows the target URL. The load balancer is also configured to check the health of the target Client Access servers in the load balancing pool; in this case, a health probe is configured on each virtual directory.

As long as the OWA health probe response is healthy, the load balancer will keep the target CAS in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target CAS from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server.

1Namespace-Affinity-1
Figure 5: Single Namespace with Layer 7 (No Session Affinity) - Health Probe Failure

Single Namespace / Session Affinity

In this scenario, a single namespace is deployed for all the HTTP protocol clients (mail.contoso.com). The load balancer is configured to maintain session affinity (layer 7), meaning SSL termination occurs and the load balancer knows the target URL. The load balancer is also configured to check the health of the target Client Access servers in the load balancing pool; in this case, the health probe is configured on each virtual directory.

As long as the OWA health probe response is healthy, the load balancer will keep the target CAS in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target CAS from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server.

1Namespace-Affinity-1
Figure 6: Single Namespace with Session Affinity - Health Probe Failure

Note: By having session affinity enabled, the load balancer’s capacity and utilization are decreased because processing is used to maintain more involved affinity options, such as cookie-based load balancing or Secure Sockets Layer (SSL) session-ID. Check with your vendor on the impacts session affinity will have in your load balancing scalability.

Multiple Namespaces / No Session Affinity

This scenario combines the best of both worlds – provides per-protocol health checking, while not requiring complex load balancing logic.

In this scenario, a unique namespace is deployed for each HTTP protocol client; for example:

MNamespaces-NoAffinity-1
Figure 7: Multiple Namespaces with No Session Affinity

Note: As seen in the picture depicted above, ECP is provided its own namespace. ECP and OWA are intimately tied together, and thus, ECP does not necessarily require its own namespace. However, ECP does have its own application pool, is the endpoint for the Exchange Administration Center, and used by Outlook clients for certain configuration items. Therefore, you may want to provide a unique namespace for ECP.

The load balancer is configured to not maintain session affinity (layer 4). The load balancer is also configured to check the health of the target Client Access servers in the load balancing pool; in this case, the health probes are effectively configured to target the health of each virtual directory, as each virtual directory is defined with a unique namespace, and while the load balancer still has no idea what the URL is being accessed, the result is as if it does know.

As long as the OWA health probe response is healthy, the load balancer will keep the target CAS in the OWA load balancing pool. However, if the OWA health probe fails for any reason, then the load balancer will remove the target CAS from the load balancing pool for OWA requests. In other words, in this example, health is per-protocol; this means that if the health probe fails, only the affected client protocol will have to be directed to another server.

MNamespaces-NoAffinity-2
Figure 8: Multiple Namespaces with No Session Affinity - Health Probe Failure

The downside to this approach is that it introduces additional namespaces, additional VIPs (one per namespace), and increases the number of names added as subject alternative names on the certificate, which can be costly depending on your certificate provider. But, this does not introduce extra complexity to the end user – the only URL the user needs to know is the OWA URL. ActiveSync, Outlook, and Exchange Web Services clients will utilize Autodiscover to determine the correct URL.

Scenario Summary

The following table identifies the benefits and concerns with each approach:

  Benefits Concerns
Single Namespace / Layer 4 (No Session Affinity)
  • Single namespace
  • Reduced load balancer complexity
  • Session affinity maintained at CAS
  • Per-server health
Single Namespace / Layer 7 (No Session Affinity)
  • Single namespace
  • Per-protocol health
  • SSL offloading which may impact load balancer scalability
Single Namespace / Layer 7 (Session Affinity)
  • Single namespace
  • Per-protocol health
  • Session affinity maintained at load balancer
  • Increased load balancer complexity
  • Reduced load balancer scalability
Multiple Namespaces / No Session Affinity
  • Per-protocol health
  • Session affinity maintained at CAS
  • Users only have to know OWA URL
  • Multiple namespaces
  • Additional names on certificate
  • Increased rule set on load balancer
  • Multiple VIPs

Conclusion

Exchange 2013 introduces significant flexibility in your namespace and load balancing architecture. With load balancing, the decision ultimately comes down to balancing functionality vs. simplicity. The simplest solution lacks session affinity management and per-protocol health checking, but provides the capability to deploy a single namespace. At the other end of the spectrum, you can utilize session affinity management, per-protocol health checking with a single namespace, but at the cost of increased complexity. Or you could balance the functionality and simplicity spectrums, and deploy a load balancing solution that doesn’t leverage session affinity, but provides per-protocol health checking at the expense of requiring a unique namespace per protocol.

Ross Smith IV
Principal Program Manager
Office 365 Customer Experience

Updates

  • 3/7/2014: Added layer 7 (no session affinity) scenario.
36 Comments
Not applicable
Nicely explained, Thanks Ross
Not applicable
Good Article, Just want to know Is the MAPI over HTTP which launched in 2013 SP1 need to have special configuration over HLB or is it remains the same as other protocol
Not applicable
@Exchange Queries - If you plan to manage session affinity at the load balancer, then you have to configure the rule to look at the /mapi virtual directory. Otherwise, it is the same as other protocols.

Ross
Not applicable
Thanks for your reply Ross
Not applicable
"To ensure that load balancers do not route traffic to a Client Access server that Managed Availability has marked as offline, load balancer health probes must be configured to check /healthcheck.htm"


I am happy to see this added to E15. We do it today in our E14 deployment; it just took us a lot more effort to implement. :)

Not applicable
Another Great Exchange On-Premises Article from Ross :)
Not applicable
Ross just spews Exchange goodness!! ;)
Not applicable
Great article
There is a missing architecture: a single namespace and no persistance.
We can carry on multiplexing Exchange applications using the URL: /owa /ecp /ews /rpc /microsoft-active-sync etc...

It is very straight forward to deploy with advanced load-balancers.

Baptiste
Not applicable
@Baptiste - that's the first scenario. persistence and affinity are different terms that mean the same thing.

Ross
Not applicable
Great Article!!! Thanks Ross
Not applicable
Hi Ross; thanks for the article.

Actually, @Baptiste is correct. The first scenario you define is "single namespace, no affinity [persistence], L4 load balancing". You state "Since session affinity is not used by the load balancer, this means that the load balancer has no knowledge of the target URL or request content. All the load balancer uses is layer 4 information, the IP address and the protocol/port."

However, @Baptiste's assertion is that the URL *can* be used, independent of the single/multiple namespace decision. This requires L7 (rather than L4) knowledge and SSL decryption (and subsequent re-encryption prior to Exchange Server 2013 SP 1), but is really trivial to configure on advanced load balancing solutions, and is part of the standard configuration and guidance from F5 Networks, for example. [Note: I am an F5 employee.]
Not applicable
Tears in my eyes LOL like the old days in this site, coming to this site & reading a Great Deep Dive Exchange On-Premises Article. Thanks Ross :)
Not applicable
Thank you very much
Not applicable
Hi Ross, thanks for the useful info. I have a question about monitoring /healthcheck.htm. For example, in my testing, I can disable the Information Store and other services to the point that the EWS HealthSet has an AlertValue of Unhealthy.

However, after more than 30 minutes in that state, the EWS.proxy ServerComponentState never transitions to Inactive or even Unhealthy and a 200 status is always returned. This is true of all the services we monitor using /healthcheck.htm via BIG-IP. When that is the only monitor we use, it results in nodes being marked healthy incorrectly and mailbox open errors on clients.

We can work around this by applying a supplemental EAV monitor that logs into the mailbox, marking the node down if either it or /healthcheck.htm fails.

Can you advise if the behavior of Managed Availability w/r/t HealthSet AlertValue/ServerComponentState is expected, or is this a bug? Can MA be configured to improve the responsiveness of ServerComponentState?

If this is expected behavior, it seems like supplemental monitoring should be required for any HLB solution.

Also, here you state, "if the load balancer receives a different status code, then Managed Availability has marked that protocol instance down on the Client Access server". However, if I manually set the ServerComponentState to Inactive, requests to /healthcheck.htm receive *no* response. Could this be a bug? I'm running service pack 1.
thanks
Michael

Not applicable
For the last scenario Multiple Namespaces with No Session Affinity, how does this look from the load balancer perspective? Are you using multiple VIPs on the load balancer?
Not applicable
Visio is just ....Awesomely placed!,informative !!!!
Not applicable
@Dayne - Yes, you are right. I added the scenario.

@the.ShawnMartin - Yes, multiple VIPs, one per namespace.
Not applicable
Simple and easy - very well explained. Are you going to present this in the MEC?
Not applicable
Great article Ross!
I am a little confused about how this works with coexistence with Exchange 2010. Lets say you have everything working to the Exchange 2013 CAS using whatever method you choose. Since we need to move all of the namespaces from existing 2010 CAS pools to the new 2013 CAS pool, I presume the 2013 CAS proxies all the mailbox data retrieval to one specific 2010 CAS since there is no "legacy" namespace to use for a load balanced pool of Exchange 2010 CASs.

If that is accurate, what happens from a client perspective when that single 2010 CAS or the 2010 Mailbox server holding the database (assuming they are deployed on separate servers) goes down/is taken offline? Does the 2013 server pick another 2010 CAS to use and it is 100% seamless to the end user being proxied through the 2013 CAS?

Sorry for dragging 2010 coexistence into this discussion, I just haven't seen a write up as detailed as the one in this article that explains what happens to the connections being proxied to legacy versions of Exchange.
Not applicable
Good Explanation
Not applicable
Ross - Do we need to configure persistence(such as cookie) on the 2013 Exchange OWA VIP to accommodate during the transition from 2010 to 2013? Or will the 2013 servers handle the legacy OWA persistence to 2010 servers through software? Thanks.
Not applicable
Does the statement that "session affinity is no longer required" also apply for requests to MRS Proxy (migrations to Exchange Online)?
Not applicable
@Bob and @ConfusedAboutCoexistence - I will cover this in more detail in the next article. Stay tuned!
Not applicable
What about tcp session timeouts on the load balancer? This was important for Ex2010.
Not applicable
@jpalarchio - Yes, the statement applies even for MRS Proxy (because in the end MRS Proxy is going to CAS which proxies it to the appropriate Mailbox server).

Ross
Not applicable
Thank you Ross for describe nicely.
Not applicable
Very good publish. Thank you Mr. Ross Smith
Not applicable
Ross,

Your explanation of 'single namespace / layer 7 (no session affinity)' & 'single namespace / layer 7 (session affinity)' is actually the same. But what do you mean with Concern: SSL offloading which may impact load balancer scalability?

To offload or not offload SSL traffic from Exchange 2013 CAS is just a choice regarding security and performance impact and possible with both scenarios?

Regards, Martijn

Not applicable
Ross,

Your explanation of 'single namespace / layer 7 (no session affinity)' & 'single namespace / layer 7 (session affinity)' is actually the same. But what do you mean with Concern: SSL offloading which may impact load balancer scalability?

To offload or not offload SSL traffic from Exchange 2013 CAS is just a choice regarding security and performance impact and possible with both scenarios?

Regards, Martijn
Not applicable
@Ross - great article - Set-ServerComponentState -Component OwaProxy –Requestor Maintenance –Status Inactive

should be

Set-ServerComponentState -Component OwaProxy –Requester Maintenance –State Inactive

Not applicable
Thanks Ross!! for detailed and Simple Explanation... Helps lot!
Not applicable
Hi Ross, I am curious as to the nature of Outlook 2011 clients. I understand that they do not use RPC over HTTPS, but instead use EWS. As we have many MAC clients, we have found that EWS appPools crash and there is a lot of NetLogon issues. In fact, we have had to increase our MaxConcurrentAPI from the default of 10 (for server 2012) to 50. We are now looking to increase to 100 or 150. I am assuming that that outlook 2011 is using NTLM, which is causing the high netlogon issues. We also seem to be finding disconnects and retransmission issues for EWS. We have been on the phone with PSS many times and for long hours over the past two months sending logs and perfmon data back and forth and they cannot determine the cause. In this case, should there be any consideration for session affinity for MAC clients, or are they treated the same as RPC/HTTPS clients? Are there other considerations that we should make, such as BASIC authentication on the EWS sub-directory to reduce the NetLogon issues?
Not applicable
Ross,

In your command "Set-ServerComponentState", your example parameter to put the service into an Inactive state is -status, however I think it should be -state. Not a big deal and anyone can probably figure it out, but you might want to correct that just in case.
Not applicable
An Informative Piece! Kudos! :)
Not applicable
Load Balancing in Exchange 2013
thank you indeed
Copper Contributor

Very good publish. Thank you Mr. Ross Smith

Version history
Last update:
‎Jul 01 2019 04:18 PM
Updated by: