Exchange 2010 SP2 RU1 contained a fix for an issue regarding database space reclamation. There seems to be some confusion around this issue, so I’d like to explain what the problem was, what the fix is, and discuss expectations when the use of third-party products stub items within the database (stubbing is the process whereby a message is replaced with a pointer object and the original version is stored in an external repository).
The bug and the fix
In Exchange 2010, we found that database space was not being reclaimed in several scenarios where items are hard deleted (when the item has been purged from the store). This includes archival scenarios where items are bulk deleted based on date, stubbing scenarios where the attachments are deleted, and other scenarios such as journal mailboxes and public folder replication. This issue affected a wide range of customers, and the problem was with the way item deletion works.
Exchange 2010 will usually try to clean up the row for a deleted item within a couple of minutes of the item being hard deleted. However, cleanup will fail if something still has open references to the item. For example, possible scenarios where the cleanup will fail are: content indexing is indexing the message, a separate client is reading the message, etc. If anything has open references, cleanup fails. Prior to the fix, if cleanup failed at this point, that space was effectively leaked and could not be recovered without moving the mailbox.
It turned out that outstanding references on deleted messages are particularly common on journaling mailboxes, or when third-party archival software is being used, so the vast majority of customers who experienced this bug associated it with one of those two scenarios. But technically, any client could hold an open reference and prevent an item from being cleaned up.
The fix included in SP2 RU1 corrects the problem by adding a cleanup retry mechanism. If cleanup fails because something still has an open reference, the store will periodically retry until it succeeds.
Did this ‘fix’ stubbing?
Stubbing refers to a process where a third-party archival product takes a large message and turns it into a smaller item, or stub. It typically does this by deleting the attachments and modifying the message body to be smaller.
To the extent that stubbing deletes attachments, it could be affected by this bug, because those deleted attachments might fail cleanup due to open references. But other than that, this bug was never really about stubbing.
What if stubbing isn’t giving me as much space as I expect?
The Exchange Information Store has changed a lot over the years, but even back on Exchange 2003 and 2007, stubbing would result in databases that were significantly larger than you might expect when adding up mailbox sizes. This was mostly due to two types of overhead.
The first type of overhead is properties that simply don't count toward the size of the message. When you look at the size of a message in Outlook, the size you see is not the total size of all the data that Exchange stores for that message. We store certain properties which aren’t counted against the user and aren’t reflected in the message size. These may be publicly documented properties such as PR_URL_NAME or other internal properties which clients can’t access.
The second type of overhead is page fragmentation. The database page size has changed over the years, from 4k in Exchange 2003, to 8k in Exchange 2007, to 32k in Exchange 2010. Each record in the database has to be arranged onto these pages, and depending on how efficiently we are able to do so, there will be some space left on the page. This is page fragmentation, which results in empty space that cannot be reclaimed by database maintenance. The larger page sizes of more recent versions of Exchange make it easier to arrange several small items onto a single page, resulting in less fragmentation, but some amount of fragmentation will always be there.
The amount of overhead generally doesn’t change much between messages. A small message will have just as much overhead as a big one. For this reason, when you fill a database with small items, the overhead becomes much more visible.
For example, consider a scenario where you’ve filled a database with items that are all 1 MB in size, and these items all have 1k of overhead. That means your database has roughly 0.09% overhead. Now let’s say you stub all the items in the database, reducing them to 4k in size. Suddenly that 1k of overhead is much more noticeable, taking up 25% of the database. I once personally produced an Exchange 2003 database that was 70% overhead by filling it with tiny items and stubbing the entire thing.
What about Exchange 2010?
In Exchange 2010, there’s an additional factor that may affect the space reclamation from stubbing, and that’s the change in the design of the database.
Exchange 2010 made huge strides in reducing database IOPS. A big part of this was getting rid of the old online defrag process that aggressively churned through the database every night and moved things around to optimize for space. In Exchange 2010 we’re much more focused on storing mailbox content on contiguous pages to reduce IO, even if it means we might have some unused space here and there. In other words, Exchange 2010 simply was not intended to aggressively reclaim space when a message suddenly shrinks. It’s a scenario we minimized in the architecture to improve our IO. For more information on the maintenance changes, see Database Maintenance in Exchange 2010.
Does that mean you won’t save space by stubbing? Not necessarily. Stubbing products typically delete attachments, so we would expect that space to be freed. And despite not being designed for it, we’ve seen Exchange 2010 free up some space even when items without attachments get stubbed.
However, because this is not part of the design of Exchange 2010, we can’t tell you what to expect as far as how much space should be freed by using stub archival. Any expectations around this should come from your own testing or from data provided by your archival software vendor.
In other words, we cannot make any claims around reclamation of space when an item is modified to make its properties smaller than they were before. If modifying an item to make it smaller frees no space at all, that can be considered to be by design in Exchange 2010. As always, the support investigation of database space issues when using third-party products might result in us referring you to the specific third-party for support if Exchange is found to function by design.
Exchange 2010 SP2 RU1 included a fix that will help customers doing any type of archival, stubbing or otherwise, because it makes the cleanup of hard deleted items work reliably. However, any expected disk space savings from stubbing itself needs to be validated by you and your archival software vendor. When provisioning your storage, we suggest that you follow our published guidance (which includes leveraging large mailboxes so that you do not need to rely on third-party archival solutions) and tools, to make sure that you have adequate space for your Exchange data.