Exchange, Stubbing, and Database Space Reclamation
Published Aug 27 2012 03:41 PM 30.7K Views

Exchange 2010 SP2 RU1 contained a fix for an issue regarding database space reclamation. There seems to be some confusion around this issue, so I’d like to explain what the problem was, what the fix is, and discuss expectations when the use of third-party products stub items within the database (stubbing is the process whereby a message is replaced with a pointer object and the original version is stored in an external repository).

The bug and the fix

In Exchange 2010, we found that database space was not being reclaimed in several scenarios where items are hard deleted (when the item has been purged from the store). This includes archival scenarios where items are bulk deleted based on date, stubbing scenarios where the attachments are deleted, and other scenarios such as journal mailboxes and public folder replication. This issue affected a wide range of customers, and the problem was with the way item deletion works.

Exchange 2010 will usually try to clean up the row for a deleted item within a couple of minutes of the item being hard deleted. However, cleanup will fail if something still has open references to the item. For example, possible scenarios where the cleanup will fail are: content indexing is indexing the message, a separate client is reading the message, etc. If anything has open references, cleanup fails. Prior to the fix, if cleanup failed at this point, that space was effectively leaked and could not be recovered without moving the mailbox.

It turned out that outstanding references on deleted messages are particularly common on journaling mailboxes, or when third-party archival software is being used, so the vast majority of customers who experienced this bug associated it with one of those two scenarios. But technically, any client could hold an open reference and prevent an item from being cleaned up.

The fix included in SP2 RU1 corrects the problem by adding a cleanup retry mechanism. If cleanup fails because something still has an open reference, the store will periodically retry until it succeeds.

Did this ‘fix’ stubbing?

Stubbing refers to a process where a third-party archival product takes a large message and turns it into a smaller item, or stub. It typically does this by deleting the attachments and modifying the message body to be smaller.

To the extent that stubbing deletes attachments, it could be affected by this bug, because those deleted attachments might fail cleanup due to open references. But other than that, this bug was never really about stubbing.

What if stubbing isn’t giving me as much space as I expect?

The Exchange Information Store has changed a lot over the years, but even back on Exchange 2003 and 2007, stubbing would result in databases that were significantly larger than you might expect when adding up mailbox sizes. This was mostly due to two types of overhead.

The first type of overhead is properties that simply don't count toward the size of the message. When you look at the size of a message in Outlook, the size you see is not the total size of all the data that Exchange stores for that message. We store certain properties which aren’t counted against the user and aren’t reflected in the message size. These may be publicly documented properties such as PR_URL_NAME or other internal properties which clients can’t access.

The second type of overhead is page fragmentation. The database page size has changed over the years, from 4k in Exchange 2003, to 8k in Exchange 2007, to 32k in Exchange 2010. Each record in the database has to be arranged onto these pages, and depending on how efficiently we are able to do so, there will be some space left on the page. This is page fragmentation, which results in empty space that cannot be reclaimed by database maintenance. The larger page sizes of more recent versions of Exchange make it easier to arrange several small items onto a single page, resulting in less fragmentation, but some amount of fragmentation will always be there.

The amount of overhead generally doesn’t change much between messages. A small message will have just as much overhead as a big one. For this reason, when you fill a database with small items, the overhead becomes much more visible.

For example, consider a scenario where you’ve filled a database with items that are all 1 MB in size, and these items all have 1k of overhead. That means your database has roughly 0.09% overhead. Now let’s say you stub all the items in the database, reducing them to 4k in size. Suddenly that 1k of overhead is much more noticeable, taking up 25% of the database. I once personally produced an Exchange 2003 database that was 70% overhead by filling it with tiny items and stubbing the entire thing.

What about Exchange 2010?

In Exchange 2010, there’s an additional factor that may affect the space reclamation from stubbing, and that’s the change in the design of the database.

Exchange 2010 made huge strides in reducing database IOPS. A big part of this was getting rid of the old online defrag process that aggressively churned through the database every night and moved things around to optimize for space. In Exchange 2010 we’re much more focused on storing mailbox content on contiguous pages to reduce IO, even if it means we might have some unused space here and there. In other words, Exchange 2010 simply was not intended to aggressively reclaim space when a message suddenly shrinks. It’s a scenario we minimized in the architecture to improve our IO. For more information on the maintenance changes, see Database Maintenance in Exchange 2010.

Does that mean you won’t save space by stubbing? Not necessarily. Stubbing products typically delete attachments, so we would expect that space to be freed. And despite not being designed for it, we’ve seen Exchange 2010 free up some space even when items without attachments get stubbed.

However, because this is not part of the design of Exchange 2010, we can’t tell you what to expect as far as how much space should be freed by using stub archival. Any expectations around this should come from your own testing or from data provided by your archival software vendor.

In other words, we cannot make any claims around reclamation of space when an item is modified to make its properties smaller than they were before. If modifying an item to make it smaller frees no space at all, that can be considered to be by design in Exchange 2010. As always, the support investigation of database space issues when using third-party products might result in us referring you to the specific third-party for support if Exchange is found to function by design.

Conclusion

Exchange 2010 SP2 RU1 included a fix that will help customers doing any type of archival, stubbing or otherwise, because it makes the cleanup of hard deleted items work reliably. However, any expected disk space savings from stubbing itself needs to be validated by you and your archival software vendor. When provisioning your storage, we suggest that you follow our published guidance (which includes leveraging large mailboxes so that you do not need to rely on third-party archival solutions) and tools, to make sure that you have adequate space for your Exchange data.

Bill Long

8 Comments
Not applicable

Great article, thank you for the information.  

In recent years I have had the pleasure of migrating a Messaging system from Exchange 2003 to Exchange 2010.  We employed a 3rd party stubbing and archiving solution in Exchange 2003, and have since brought online the same 3rd party stubbing and archiving solution for our Exchange 2010 environment.

With Exchange 2003 we experienced a large space savings with stubbing.  As mentioned above, the 4k page files in Exchange 2003 and the "old online defrag" process must have contributed to these savings.

In Exchange 2010 we experience a great loss of space savings over time with stubbing.   In our environment thousands of emails are stubbed daily.  We must stub items due to the fact we are highly consolidated and due to the high load our system has.  

Databases grow consistently over time and space, which we are calling "Unaccountable", grows as well.  

One example of an Exchange 2010 database that is 8 months old:

Database-A:

File Size: 170GB

Sum of Mailboxes: 88.6GB

Whitespace: 1.1GB

Total Version/Dumpster store: 2.6GB

Items stubbed during 8 months: Hundreds of thousands

Unaccountable Space: 77GB

If a new database is created and the mailbox data moved to the new database, the size of new database will be close to 90-95GB. This is probably due to the mailbox moves placing data in a contiguous manner.

Our Exchange database environment totals 4.2TB in files, however our mailbox data totals only 2.1GB.  We employ a manual process of database recreation often to keep our storage under a certain total size.  Constant database recreation (although easier in Exch2010) is not one of our desired weekly routines.

Although Exchange 2010 has implemented a native archiving solution, there are many customers that have legacy 3rd party repositories that would be hard to break from.  The size of our native Exchange Archive mailboxes would require us to keep tens of terabytes of databases online to serve what we are required to offer our users in real-time. The 3rd party solution we use allows us to provide even lower cost storage than an Exchange database requires for this old data, and gives us back single item recovery which was lost in Exchange 2010 (for understandable design reasons).

If it is the redesign of the nightly Online Maintenance routine that took away this intense form of mailbox content processing, there may be some enterprise customers still requiring this routine to function as before or at least close.  

If it were possible to set a level of intrusiveness for the Online Maintenance routine, a Microsoft Exchange 2010 customer could set the balance between Disk I/o and database space reclamation that their system could handle.

Due to the high cost of requiring double the amount of space in our enterprise environment, we are working our way through our second Microsoft Professional Support case on this topic.  I thank you for the above information. Hopefully our case will come to an enterprise level solution.

Not applicable

@Bill,

great article indeed.

I m just a bit confused when you say that Exchange can arrange several items into a single page? I knew that an item (mail..) can spread over multiple pages but I always thought that multiple items can not be contained in a single page and so that a single 4 KB mail was taking 32 KB database space in E2010.

@morser,

you're not alone. I face exactly the same behaviour with exactly the same figures. I'm using an external third party archiving system and the size of my databases on the disks is about twice the sum of the mailbox sizes.

This behaviour should be taken into account by the next releases of the Exchange Storage calculator!

Not applicable

I think It should be possible to tell what sort of data (or other reasons like fragmentation) blocks space in a Database.

What about an extension of the space related output of the Get-MailboxDatabase command?

Could you extend this command with some output like "fragmentedspace" and "SumO

fMessageSpaceNotCountedToMessagespace"?

Not applicable

@Morser, you piqued my curiosity. What storage type is more affordable than the SATA or SAS solutions a properly designed Exchange 2010 can use today?

Not applicable

morser,

I'd have to "See the math" in the space cost savings your claiming in archiving.  Every time I've gone through the exercise with my clients the reality ends up being that they are using unrealistically costly SAN storage.  1TB drives are cheap when you just buy a couple of SAS/SATA disks (couple hundred dollars?), and thanks to the mentioned IO improvements they can handle the IOPs just fine.  Resilience can be maintained with your system's local RAID or a DAG depending on your size .... both scenarios are usually much cheaper than anything you can put on a SAN (I've yet to see a decent SAN come in under 5 figures).

Honestly you could very likely put everything on local disks for under $1000 per server .... I'm not even sure you can buy an archive solution for that kind of money.

Not applicable

Great explanation, Bill.  Thanks for sharing this.  Does this mean that Microsoft does not support stubbing?

Not applicable

This is a tad annoying. It's nice to have an explanation of it all but I had thought that the update would be more aggressive as morser points out.

Like him for my environment there wouldn't be a problem with a 'traditional' Online Maintenance run behind the scenes.

We also stub everything after 30 days, not so much for disk space as for backup since we are rather constrained in that area. As a finance company we have to journal everything anyway, goes to an EMC Centera. Therefore for us it doesn't really matter where the majority of the message lives, if it is in the Centera all we need is the stub pointer to be left in Exchange, hopefully not taking up too much space.

We use a combination of SAN and local storage between physical and virtual servers (one physical server has local storage as a security measure - we once had an EMC engineer switch off our SAN by mistake), remember that a lot of people use Exchange purely as a VM nowadays and are thus forced to use SAN storage, or perhaps high capacity iSCSI.

For reference an HP midline 1TB SATA drive for a physical server will come out at about $270 with a 1 year warranty. So definitely lots cheaper than SAN but not usable in all circumstances, especially if you want to use 1U servers and have large DB's.

Up to now I've been doing the old disk expansion, new database creation, mailbox move dance and was hoping that when I upgraded to SP2 I'd be able to do one last mailbox move for everyone and then be in a more or less steady state.

Usually recover at least 50% of the storage used by the old database.

Sounds like I'm still going to be doing the dance, just less often.

Which isn't helpful since as I said I'm backup constrained and have to back up the old database, the new database AND all the log files while I'm doing the moves.

If you search in the Exchange forums you'll probably see my previous complaints on this issue. :)

Neill

Not applicable

One additional thing to that whitespace stuff...  I have a test environment, consisting of about 20k users, I think.

All mailboxes were completely empty, not a single e-mail has been stored....   the database ran out of space after 70 GB... :)  Not what I expected

Version history
Last update:
‎Jul 01 2019 04:08 PM
Updated by: