Store Driver Fault Isolation Improvements in Exchange 2010 SP1
Published Apr 11 2011 09:30 AM 81.6K Views

Background

The Exchange Store Driver is a core transport component which lives both on the Mailbox server role (as the mail submission service) and the Hub server role. It is responsible for:

  • Retrieving messages from the mailbox server that have been submitted by end-users and submitting those to the Hub transport role for categorization and routing.
  • Delivering messages to the appropriate mailbox server based on the location of the recipients mailbox.
  • Extensibility platform for both mail submission & delivery. Store Driver currently hosts a number of agents that extend the functionality of Exchange. Examples include such agents as Inbox Rules, Conversations, meeting forward notifications, etc.

Exchange 2010 is currently being utilized in Live@EDU, as well as the upcoming Office 365. As you can probably imagine, the Exchange servers that run in those datacenters are loaded and pushed harder than almost any other Exchange server imaginable. Prior to SP1, there were several problems that were encountered with mail delivery to the Exchange mailbox store. In particular, there was a need to make sure that a handful of recipients did not starve the rest of the mail delivery system.

While many of you may not have noticed this problem, Microsoft has seen many of these types of cases over the years; often isolated to a single event like an inadvertent public folder replication storm.

This was despite the message throttling that was already available. Transport roles have also had functionality to avoid resource starvation known as Back Pressure, but this was not designed to protect the system from messages that were already in the Local Delivery queue.

Changes in SP1

In order to further protect both the Mailbox servers and Hub servers from resource starvation, new thread limits were introduced in SP1:

KeyDescriptionScenarioError in Connectivity Log:
<add key=”RecipientThreadLimit” value=”1”/> Limit beyond which no more threads can be allocated to the recipient for delivery.

Note: If this is increased, you should increase MaxMailboxDeliveryPerMdbConnections as well, so that slow or hung deliveries to a single recipient will not block delivery for the entire MDB.

Flood of messages to a single Mailbox or a performance problem associated with a single mailbox, has minimal impact on delivery to the rest of the Mailboxes in the database. Throttled delivery for recipient <recipient> due to concurrency limit <limit>
<add key=”MaxMailboxDeliveryPerMdbConnections” value=”2”/> The maximum number of concurrent connections to a single “healthy” Mailbox Database.

Database health is determined by the Health Monitor API and recorded in the connectivity logs as a value between -1, 0-100. 100 being healthy.

Connections hang to a single problematic database have minimal impact on delivery of other queued messages Throttled Delivery due to server limit for <server FQDN> with threshold

Note: These keys are not present in the EdgeTransport.exe.config file by default.

Is it possible to have too much protection?

Unfortunately, there are two scenarios after applying SP1 where we are seeing customers with messages backing up in the queue. The temporary error message is:

432 4.3.2 STOREDRV.Deliver; recipient thread limit exceeded

As you can probably guess, the two scenarios are:

  • Journaling
  • Public Folders

In both cases, the deliveries are occurring to a single recipient (or very small number of recipients). This is likely to occur during heavy mail flow. The screen shot below was taken from a lab server while reproducing the issue:

Screenshot: Messages backed up in mailbox delivery queue due to recipient thread limits
Figure 1: Messages backed up in mailbox delivery queue due to recipient thread limits being exceeded (click here for larger screenshot)

You can see a historical history of 4.3.2 events in connectivity logs on your Hub Transport servers (in the \Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\Connectivity\CONNECTLOGxxxxxxxx-x.LOG), like:

#Software: Microsoft Exchange Server
#Version: 14.0.0.0
#Log-type: Transport Connectivity Log
#Date: 2011-01-12T00:00:00.775Z
#Fields: date-time,session,source,Destination,direction,description
 
2011-01-12T00:00:00.775Z,08CD7F1200CBDBD0,MapiSubmission,5f24e416-c380-41b5-bfe0-37b6f1091f49,>,"Failed; HResult: 1140850693; DiagnosticInfo: Stage:LoadItem, SmtpResponse:432-4.3.2 STOREDRV; mailbox server is too busy"
 
2011-01-12T00:00:00.775Z,08CD7F1200CBDBD0,MapiSubmission,5f24e416-c380-41b5-bfe0-37b6f1091f49,-,RegularSubmissions: 0 ShadowSubmissions: 0 Bytes: 0 Recipients: 0 Failures: 1 ReachedLimit: False Idle: False
 
2011-01-12T00:00:05.792Z,08CD7F1200CBDBD1,MapiSubmission,5f24e416-c380-41b5-bfe0-37b6f1091f49,+,Win2k8R2Ex14.dom2k8r2ex14.lab

In some cases, simply leaving the servers alone should cause the queues to slowly drain. In other cases, it may be necessary to take further action because the problem has persisted for a while or because it isn’t part of a one-time event like a Public Folder replication storm.

Alright, I understand. Now how do I fix my situation?

Like every other throttling & performance related feature that has ever been in Exchange, the solution isn’t exactly straight forward. For starters, we’ve seen some level of success simply incrementing both values up one, as follows:

<add key="RecipientThreadLimit" value="2" />
<add key="MaxMailboxDeliveryPerMdbConnections" value="3" />

In fact, there has been en ough success with this that we’re considering changing the defaults in a future rollup or service pack. Of course, when we do, the new defaults will only apply to those who haven’t already modified the values. Although it has not yet been released, KB 2491972 will discuss the change when the time comes.

So if +1 is better, then why not +2 or more?

Well, the problem is that all Exchange servers will ultimately be bound by some hardware resource. You don’t want to introduce thrashing or resource starvation due to a public folder or journaling server event that now impacts all users as well. In addition, there are other limits that control the maximum number of threads that can service these types of deliveries. In short, we are not recommending going above 4 for MaxMailboxDeliveryPerMdbConnections, and RecipientThreadLimit should always be at least one fewer.

In an extreme case, if you are in a very busy environment with dedicated journaling mailbox servers, it may be worth considering having dedicated hub transport servers to go along with that. Of course, they can be virtualized or multi-role boxes, but in order to be isolated, the whole bunch will need to be in a dedicated site. Then, you would be able to increase these limits without risking delivery to other mailboxes.

Of course, this is an extreme example. For the rest of us, it may be as simple as trying the +1 approach while carefully monitoring the hub & mailbox servers.

Scott Landry, Steve Schiemann, Frank Brown and Jason Nelson

7 Comments
Version history
Last update:
‎Jul 01 2019 03:58 PM
Updated by: