Blog Post

Exchange Team Blog
7 MIN READ

Transport database: understand, size and troubleshoot

The_Exchange_Team's avatar
Jan 16, 2020

Introduction

Over time, we have seen multiple support tickets where the transport database “mail.que” file grows unexpectedly, which in turn causes various mail flow related issues. In this article, we will provide details about the transport database mail.que and help you avoid issues caused by large database.

Understanding backpressure

Before jumping into the issue and possible solution, we need to understand how Exchange Transport Service monitors and reacts to the various system resources like memory and disk space. This is also known as “Back pressure”. Back pressure is a system resource monitoring feature of the Microsoft Exchange Transport service that exists on Hub Servers (2010), Mailbox servers and Edge Transport servers. Back pressure detects when vital system resources (hard drive space, memory etc.) are overused, and takes action to prevent the server from becoming completely overwhelmed and unavailable. You can read more about backpressure in this article. The net result is that if the disk space utilization on the drive that holds the Mail Queue Database or Log files exceeds a certain threshold, Transport service will delay or even stop accepting new messages.

What’s stored inside the Mail.que database?

Transport storage capacity is driven by two needs: temporary queuing of inbound and outbound messages (including shadow queuing) and Safety Net (earlier known as transport dumpster). You can think of the transport storage capacity requirement as the sum of message content on disk in a worst-case scenario. Broadly, your mail.que consists of following elements:

  • The current day’s message traffic, along with messages which exist on disk longer than normal expiration settings (like poison queue, Journal messages etc.)
  • Shadow redundancy queuing
  • Submission queue and messages waiting for delivery
  • Messages persisted in Safety Net in case they are required for re-delivery

Also note that, like with other Exchange databases, when email is deleted from Transport Database (mail.que), the database size does not shrink; instead, during online maintenance activity, ‘white space’ is created inside the database for future use.

Why is my transport database growing so large?

The size of transport database depends on its usage. The database size is directly proportional to the average size of messages, as well as number of messages per day. If an email with a larger attachment is sent to all company employees, the size of the transport database will increase more on that day. The following are some of the causes of the growth of your transport database:

  • Sudden increase of internal/external email traffic
  • Recent configuration change in SafetyNet hold time (increased from default value)
  • Setting Message Expiration timeout to more than the default number of days
  • Shadow Message Auto Discard Interval was configured with a high value

So, it’s good practice to compare the following values in your environment to their defaults:

  • Get-TransportConfig | FL SafetyNetHoldTime (default is 2 days)
  • Get-TransportConfig | FL ShadowMessageAutoDiscardInterval (default is 2 days)
  • Get-TransportService | FL MessageExpirationTimeout (default is 2 days)

Regarding the number of sent and received messages per day, you need to use Log Parser with Message tracking logs to know how many messages sent or received recently with average message size. There are also many Powershell Scripts that you can take help from and modify based on your environment to find this out. One such example is GetMessageStatstoFile Powershell script which is available in TechNet Gallery.  

How to calculate the required space for your transport database?

To determine the maximum size of the transport database, we can look at the entire messaging system as a unit (all of the servers) and then come up with a per-server value.

 

  • Overall Daily Messages Traffic = number of users x message profile (use this article to calculate message profile for your organization).
  • Overall Transport DB Size = average message size x overall daily message traffic x (1 + (percentage of messages queued x maximum queue days) + Safety Net hold days) x 2 copies for high availability

Let’s use the 100,000 user sizing example and 75KB average message size. The size the transport database using the simple method:

 

Overall Transport DB Size = 75KB x (100,000 users x 200 messages/day) x (1 + (50% x 2 maximum queue days) + 2 Safety Net hold days) x 2 copies = 11,444GB

 

In our example scenario, we have 8 DAGs, each containing 16-nodes, and we are designing to handle double node failures in each DAG. This means that in a worst-case failure event we would have 112 servers online with 2 failed servers in each DAG. We can use this value to determine a per-server transport database size:

 

As mentioned earlier, when email is deleted from the transport database mail.que, the database does not shrink, instead we create white space in the database for future use. So, if your Safety Net gets set to 2 days, you will find that initially the database size is growing exponentially. After 2 days of Hold Time, we will start deleting the messages from the database and create white space. So, from that point onwards, data will be written into the white space, and the database size won’t increase at the same rate. The best way to check the white space of a Transport Database is checking the below performance counter:

\MSExchangeTransport Database\Transport Queue Database Internal Free Space (MB).

How can I recover from issues related to large transport database?

Unless you are facing mailflow slowness or stoppage, a large mail.que of hundreds of GBs is not actually an issue, it’s completely expected because of the high mailflow traffic in your organization. But if you are facing a backpressure issue because of large Mail.que file, the first thing you could do; to solve the issue quickly; is move the Transport database or Transport database Log file path or both to a different drive which has more free space following this article.

If there is no other drive available to move the transport database, you can recreate the database so that the immediate problem is resolved. And to ensure, there is no data loss, we must make sure there is no email waiting in the queue to be sent out. The steps are mentioned below:

    • Failover Exchange databases to a different server to avoid downtime for submission from local databases
    • Pause the Microsoft Exchange Transport Service
    • Run the Get-queue command on the Exchange server and make sure that all messages have been processed and no email is stuck in the queue.
    • Make sure no other servers have shadow messages waiting for this server (otherwise they would get delivered again). You can run the following command to see all the shadow queues for a particular server:

 

get-transportserver | get-queue | ?{$_.deliveryType -eq "ShadowRedundancy" -and $_.NextHopDomain -eq "Servername"}

 

    • Stop the Microsoft Exchange Transport and Microsoft Exchange Frontend Transport services
    • Navigate to queue folder path and rename it to something like QueueOld
    • Start the Microsoft Exchange Transport and Microsoft Exchange Frontend Transport services
    • A new queue folder would be created, make sure that Application event ID 7005 has appeared: "A new database file mail.que has been created."
    • If the transport service starts successfully, you can move the QueueOld to a different location for backup and delete it later.

As mentioned earlier, initially the mail.que would grow exponentially and once the initial SafetyNetHoldTime time is reached, we will start deleting the messages from the database and create white space. So, from this time, data will be written in the white space, and the size won’t increase at the same rate. But if the issue was not caused by any recent increase in mail traffic or configuration change, eventually the size of mail.que will reach to what you had earlier. If you still cannot afford a large mail.que, you might want to change some configuration parameters and reduce the values from the current setting. For example, you might want to reduce the Safety Net hold time to 1 day instead of default 2 days using the following command:

 

Set-TransportConfig -SafetyNetHoldTime 1.00:00:00

 

Or maybe reduce the Shadow message discard interval using the following command:

 

Set-TransportConfig -ShadowMessageAutoDiscardInterval 1.00:00:00 

 

Additional Information regarding Safety Net maximum supported size

In Microsoft Exchange Server 2019 and 2016, the maximum supported database size for the transport Safety Net JET database is 2 TB. When a Hub-and-spoke topology is used, the transport Safety Net JET database can grow beyond 2 TB on Transport Servers which are part of the Hub site. Disable SafetyNet on these transport servers following our guidelines to stay within the supported limit of 2 TB.

Monitor the performance of transport database

It’s also a good practice to monitor various transport database performance counters to understand if there are any underlying issues or if there is a sudden spike in mailflow traffic that is increasing the size of your transport database. Following are some of the performance counters with expected values:

COUNTER

Check

Expected

MSExchange Database(Information Store)\Database Page Fault Stalls/sec

Avg

<10

 

Max

<100

MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Generation Checkpoint Depth

Max

<=1000

MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Version buckets allocated

Max

<=200

LogicalDisk\Average Disk sec/Read

Avg

<20ms

 

Max

<=50ms

LogicalDisk\Average Disk sec/Write

Avg

<20ms

 

Max

<=50ms

\MSExchangeTransport Queues(_total)\Aggregate Delivery Queue Length (All Queues)

Avg

<=3000

 

Max

<=5000

\MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Active Mailbox Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Submission Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Active Non-Smtp Delivery Queue Length

Max

<=250

\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Retry Non-Smtp Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length

Max

<=100

\MSExchangeTransport Queues(_total)\Unreachable Queue Length

Max

<=100

Hope you find this useful!

Abdelrahman Ahmed and Arindam Thokder

Published Jan 16, 2020
Version 1.0
  • whatisit's avatar
    whatisit
    Copper Contributor

    edit: oops i asked a question that was answered in the article

  • ChristopheCLDC's avatar
    ChristopheCLDC
    Copper Contributor

    Hi Niakris I think I see now what you mean thanks to the code and its ouput. If you correlate both you will see that Jetsessions match with QueueLength[SubmissionQueue]. 

    Side note : jet is likely an inheritage of the old database technology JET database technology used during years in the Microsoft World by many services.

     

  • NiakrisAnn's avatar
    NiakrisAnn
    Copper Contributor

    Hi!

    How about "Jet Sessions" backpressure parameter?

    I had some issues with it but haven't found any information in Internet

  • ChristopheCLDC's avatar
    ChristopheCLDC
    Copper Contributor

    hi NiakrisAnn , not sure what do you mean ?
    If the above guidance does not help to address the transport issue, it always best practices to not forget the basics: hardware (undersized) or the Security Software ....

  • Niakris's avatar
    Niakris
    Copper Contributor

    hi ChristopheCLDC !

    When I run 

    [xml]$bp=Get-ExchangeDiagnosticInfo -Process EdgeTransport -Component ResourceThrottling; $bp.Diagnostics.Components.ResourceThrottling.ResourceTracker.ResourceMeter |select resource,currentresourceuse,pressure

    I get:

    Resource                                       CurrentResourceUse Pressure
    --------                                       ------------------ --------
    PrivateBytes                                   Low                5       
    SystemMemory                                   Low                36      
    UsedVersionBuckets[G:\Queue\mail.que]          Low                1       
    JetSessions[G:\Queue\mail.que]                 Low                13      
    CheckpointDepth[G:\Queue\mail.que]             Low                0       
    DatabaseUsedSpace[G:\Queue]                    Low                5       
    UsedDiskSpace[G:\Queue]                        Low                5       
    UsedDiskSpace[E:\Exchange\TransportRoles\data] Low                51      

    so I am curious what does the "JetSessions" resource mean? 

    I haven't found any referencies in the microsoft article about backpressure(

    In this article: https://learn.microsoft.com/en-us/exchange/mail-flow/back-pressure?view=exchserver-2019