Introduction
Over time, we have seen multiple support tickets where the transport database “mail.que” file grows unexpectedly, which in turn causes various mail flow related issues. In this article, we will provide details about the transport database mail.que and help you avoid issues caused by large database.
Understanding backpressure
Before jumping into the issue and possible solution, we need to understand how Exchange Transport Service monitors and reacts to the various system resources like memory and disk space. This is also known as “Back pressure”. Back pressure is a system resource monitoring feature of the Microsoft Exchange Transport service that exists on Hub Servers (2010), Mailbox servers and Edge Transport servers. Back pressure detects when vital system resources (hard drive space, memory etc.) are overused, and takes action to prevent the server from becoming completely overwhelmed and unavailable. You can read more about backpressure in this article. The net result is that if the disk space utilization on the drive that holds the Mail Queue Database or Log files exceeds a certain threshold, Transport service will delay or even stop accepting new messages.
What’s stored inside the Mail.que database?
Transport storage capacity is driven by two needs: temporary queuing of inbound and outbound messages (including shadow queuing) and Safety Net (earlier known as transport dumpster). You can think of the transport storage capacity requirement as the sum of message content on disk in a worst-case scenario. Broadly, your mail.que consists of following elements:
Also note that, like with other Exchange databases, when email is deleted from Transport Database (mail.que), the database size does not shrink; instead, during online maintenance activity, ‘white space’ is created inside the database for future use.
Why is my transport database growing so large?
The size of transport database depends on its usage. The database size is directly proportional to the average size of messages, as well as number of messages per day. If an email with a larger attachment is sent to all company employees, the size of the transport database will increase more on that day. The following are some of the causes of the growth of your transport database:
So, it’s good practice to compare the following values in your environment to their defaults:
Regarding the number of sent and received messages per day, you need to use Log Parser with Message tracking logs to know how many messages sent or received recently with average message size. There are also many Powershell Scripts that you can take help from and modify based on your environment to find this out. One such example is GetMessageStatstoFile Powershell script which is available in TechNet Gallery.
How to calculate the required space for your transport database?
To determine the maximum size of the transport database, we can look at the entire messaging system as a unit (all of the servers) and then come up with a per-server value.
Let’s use the 100,000 user sizing example and 75KB average message size. The size the transport database using the simple method:
Overall Transport DB Size = 75KB x (100,000 users x 200 messages/day) x (1 + (50% x 2 maximum queue days) + 2 Safety Net hold days) x 2 copies = 11,444GB
In our example scenario, we have 8 DAGs, each containing 16-nodes, and we are designing to handle double node failures in each DAG. This means that in a worst-case failure event we would have 112 servers online with 2 failed servers in each DAG. We can use this value to determine a per-server transport database size:
As mentioned earlier, when email is deleted from the transport database mail.que, the database does not shrink, instead we create white space in the database for future use. So, if your Safety Net gets set to 2 days, you will find that initially the database size is growing exponentially. After 2 days of Hold Time, we will start deleting the messages from the database and create white space. So, from that point onwards, data will be written into the white space, and the database size won’t increase at the same rate. The best way to check the white space of a Transport Database is checking the below performance counter:
\MSExchangeTransport Database\Transport Queue Database Internal Free Space (MB).
How can I recover from issues related to large transport database?
Unless you are facing mailflow slowness or stoppage, a large mail.que of hundreds of GBs is not actually an issue, it’s completely expected because of the high mailflow traffic in your organization. But if you are facing a backpressure issue because of large Mail.que file, the first thing you could do; to solve the issue quickly; is move the Transport database or Transport database Log file path or both to a different drive which has more free space following this article.
If there is no other drive available to move the transport database, you can recreate the database so that the immediate problem is resolved. And to ensure, there is no data loss, we must make sure there is no email waiting in the queue to be sent out. The steps are mentioned below:
get-transportserver | get-queue | ?{$_.deliveryType -eq "ShadowRedundancy" -and $_.NextHopDomain -eq "Servername"}
As mentioned earlier, initially the mail.que would grow exponentially and once the initial SafetyNetHoldTime time is reached, we will start deleting the messages from the database and create white space. So, from this time, data will be written in the white space, and the size won’t increase at the same rate. But if the issue was not caused by any recent increase in mail traffic or configuration change, eventually the size of mail.que will reach to what you had earlier. If you still cannot afford a large mail.que, you might want to change some configuration parameters and reduce the values from the current setting. For example, you might want to reduce the Safety Net hold time to 1 day instead of default 2 days using the following command:
Set-TransportConfig -SafetyNetHoldTime 1.00:00:00
Or maybe reduce the Shadow message discard interval using the following command:
Set-TransportConfig -ShadowMessageAutoDiscardInterval 1.00:00:00
Additional Information regarding Safety Net maximum supported size
In Microsoft Exchange Server 2019 and 2016, the maximum supported database size for the transport Safety Net JET database is 2 TB. When a Hub-and-spoke topology is used, the transport Safety Net JET database can grow beyond 2 TB on Transport Servers which are part of the Hub site. Disable SafetyNet on these transport servers following our guidelines to stay within the supported limit of 2 TB.
Monitor the performance of transport database
It’s also a good practice to monitor various transport database performance counters to understand if there are any underlying issues or if there is a sudden spike in mailflow traffic that is increasing the size of your transport database. Following are some of the performance counters with expected values:
COUNTER |
Check |
Expected |
MSExchange Database(Information Store)\Database Page Fault Stalls/sec |
Avg |
<10 |
|
Max |
<100 |
MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Generation Checkpoint Depth |
Max |
<=1000 |
MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Version buckets allocated |
Max |
<=200 |
LogicalDisk\Average Disk sec/Read |
Avg |
<20ms |
|
Max |
<=50ms |
LogicalDisk\Average Disk sec/Write |
Avg |
<20ms |
|
Max |
<=50ms |
\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues) |
Avg |
<=3000 |
|
Max |
<=5000 |
\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length |
Max |
<=250 |
\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length |
Max |
<=250 |
\MSExchangeTransport Queues(_total)\Submission Queue Length |
Max |
<=100 |
\MSExchangeTransport Queues(_total)\Active Non-Smtp Delivery Queue Length |
Max |
<=250 |
\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length |
Max |
<=100 |
\MSExchangeTransport Queues(_total)\Retry Non-Smtp Delivery Queue Length |
Max |
<=100 |
\MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length |
Max |
<=100 |
\MSExchangeTransport Queues(_total)\Unreachable Queue Length |
Max |
<=100 |
Hope you find this useful!
Abdelrahman Ahmed and Arindam Thokder
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.