Exchange 2007 SP1 has a default transport database cache size of 128MB. This size doesn't allow as much dynamic growth as may be necessary on Hub servers with higher than normal message rates, or when an unexpected load from increased message size comes into a server. To better allow for cache growth, the new guidance is to increase the DatabaseMaxCacheSize value from 128MB to 512MB on Hub servers with 4 GB or more memory installed.
In Exchange 2007, the transport service utilizes the Extensible Storage Engine (ESE) for mail transport functionality. This provides several benefits over previous versions that used the NTFS file system:
Enhances performance by writing transactions first to log files and memory, and then to the database file
Increased transactional integrity of data stored in the queue
All mail queues are held in a single location, the transport mail queue database. In Exchange 2003 mail could be two locations during processing: the file folder structure and the local information store.
An operation occurs against the database (e.g. a new message is received), and the page that requires updating is read from the file and placed into the ESE cache (if it is not already in memory), while the log buffer is notified and records the operation in memory.
The changes are recorded by the database engine but are not immediately written to disk; instead, these changes are held in the ESE cache and are known as "dirty" pages because they have not been committed to the database. The version store (also referred to as version buckets) is used to keep track of these changes, thus ensuring isolation and consistency are maintained.
As the database pages are changed, the log buffer is notified to commit the change, and the transaction is recorded in a transaction log file (which may or may not require a log roll and a start of a new log generation).
Eventually the dirty database pages are flushed to the database file.
The checkpoint is advanced.
Unlike the Mailbox Server role, the transport service does not dynamically grow the ESE cache. Instead the ESE cache is capped at 128MB in Exchange 2007 SP1 and earlier. This value is known as DatabaseMaxCacheSize and is specified in the EdgeTransport.exe.config file. The Exchange 2007 resource monitor keeps track of how many used version buckets are currently sitting in memory. When the amount of used version buckets in memory exceeds the thresholds specified in the EdgeTransport.exe.config file of each Hub server, the resource monitor invokes backpressure and logs Event ID 15004 to indicate that the server is experiencing resource pressure. A resource pressure event is a staged process where a Hub server will first prevent new inbound SMTP messages, and then when the next threshold is reached it will prevent new Mailbox server connections in an attempt to clean uncommitted transactions out of memory and into the queue database file. There are a number of causes to this behavior, including large messages of an increased message load. The version buckets thresholds have been changed in Exchange 2007 Service Pack 1, with an increase to the default RTM values. The new values are 120 for the medium threshold, and 200 for the high threshold.
Note: the version buckets threshold value should not be set higher in an attempt to resolve resource pressure, as this will likely have a negative impact on the server availability.
To increase the performance with version buckets, and to better allow for cache growth, the new guidance is to increase the DatabaseMaxCacheSize value from 128MB to 512MB on Hub servers with 4 GB or more memory installed.
Increasing the cache size allows ESE to dynamically manage the cache size based on memory pressure. With the default limit of 128MB, ESE could not effectively grow the cache which in turn increased disk activity and delaying the flushing of data to the database. As a result, the version buckets could not be flushed fast enough, resulting in back pressure notification. When the increase in version buckets causes back pressure events, mail could queue up on the Hub server, and also queue up on remote SMTP server attempting to deliver inbound to the Hub server. In some environments this would cause a cascading effect that would require more than just the single Hub server to recover from the denied connections. Increasing the database cache size should help eliminate version buckets backpressure events.
However, this change will not eliminate version buckets backpressure events where slow performing disks are the root cause of the backpressure event. To ensure that your disk storage subsystem is properly designed, please review the Transport Server Storage Design article,
There is also an added potential performance gain when the transport dumpster size is less than database max cache size, resulting in fewer database read IOPS. IOPS is a key metric for storage sizing guidelines, and is calculated as the amount of database input/output (I/O) per second (IOPS) consumed by each mail item. In one of our tests with 12 storage groups there was a 57 percent decrease in I/Os per message. This substantial performance gain was seen in a lab using the default MaxDumpsterSizePerStorageGroup of 18MB, resulting in a max dumpster size of 216MB which is greater than DatabaseMaxCacheSize default of 128MB, but less than the recommended DatabaseMaxCacheSize of 512MB. More information on the test hardware and configuration:
Exchange 2007 Service Pack 1
Windows 2003 Service Pack 2
8 processor cores with 16GB of memory
Forefront Anti-Virus with 5 engines, max certainty
21 messages per second, of approximately 46 KB average message size
Logs and Database were on separate, dedicated drives
The follow results were seen in that test.
Hub Transport server database I/O (~21 msg/sec)
SP1 Default Cache
SP1 Modified Cache
Total IOPS per message
Log write I/Os per message (sequential)
Database write I/Os per message (random)
Database read I/Os per message (random)
In the internal Microsoft IT (MSIT) environment, where there is an Active Directory site with ~800 storage groups with a transport dumpster size of 15MB per storage group, we only observed a 15% decrease in I/Os per message after the increase to the database cache size. This decrease in IOPS per message was less than that observed in the lab due to the total size of the transport dumpster. In this case, the increased cache size resulted in increased availability of the transport service because back pressure due to version buckets was eliminated.
A possible conclusion from this testing may lead an administrator to wonder if going beyond the new 512MB value gives even more added benefit. Testing with MSIT HUB servers with 8GB total memory, setting a cache size of 2GB has not shown a significant increased benefit over 512MB. If you are deploying servers with 12GB or more you should conduct testing in your environment to assess the benefit of going beyond 512MB. However, this is not recommended and has not been tested in a production environment.