Blog Post

Exchange Team Blog
5 MIN READ

Dude, Where's My Single Instance?

The_Exchange_Team's avatar
The_Exchange_Team
Platinum Contributor
Feb 22, 2010

In Exchange Server 2010, there is no more single instance storage (SIS). To help understand why SIS is gone, let's review a brief history of Exchange.

During the development of Exchange 4.0, we had two primary goals in mind, and SIS was borne out of these goals:

  1. Ensure that messages were delivered as fast and as efficient as possible.
  2. Reduce the amount of disk space required to store messages, as disk capacity was premium.

Exchange 4.0 (and, to a certain extent, Exchange 5.0 and Exchange 5.5) was really designed as a departmental solution. Back then, users were typically placed on an Exchange server based on their organization structure (often, the entire company was on the same server).  Since there was only one mailbox database, we maximized our use of SIS for both message delivery (only store the body and attachments once) and space efficiency. The only time we created another copy within the store was when the user modified their individual instance.

For almost 19 years, the internal Exchange database table structure has remained relatively the same:

Then came Exchange 2000.  In Exchange 2000, we evolved considerably - we moved to SMTP for server-to-server connectivity, we added storage groups, and we increased the maximum number of databases per server.  The result was a shift away from a departmental usage of Exchange to enterprise usage of Exchange.  Moreover, the move to 20 databases reduced SIS effects on space efficiency, as the likelihood that multiple recipients were on the same database decreased.  Similarly, message delivery was improved by our optimizations in transport, so transport no longer benefited as much from SIS either.

With Exchange 2003, consolidation of servers took off in earnest due to features like Cached Exchange Mode.  Again the move away from departmental usage continued.  Many customers moved away from distributing mailboxes based on their organization structure to randomization of the user population across all databases in the organization.  Once again, the space efficiency effects of SIS were further reduced.

In Exchange 2007, we increased the number of databases you could deploy, which again reduced the space efficiency of SIS. We further optimized transport delivery and completely removed the need for SIS from a transport perspective.  Finally, we made changes to the information store that removed the ability to single instance message bodies (but allowed single instancing of attachments). The result was that SIS no longer provided any real space savings - typically only about 0-20%.

One of our main goals for Exchange 2010 was to provide very large mailboxes at a low cost. Disk capacity is no longer a premium; disk space is very inexpensive and IT shops can take advantage of larger, cheaper disks to reduce their overall cost. In order to leverage those larger capacity disks, you also need to increase mailbox sizes (and remove PSTs and leverage the personal archive and records management capabilities) so that you can ensure that you are designing your storage to be both IO efficient and capacity efficient.

During the development of Exchange 2010, we realized that having a table structure optimized for SIS was holding us back from making the storage innovations that were necessary to achieve our goals. In order to improve the store and ESE, to change our IO profile (from many, small, random IOs to larger, fewer, more sequential IOs), and to resolve our inefficiencies around item count, we had to change the store schema. Specifically, we moved away from a per-database table structure to a per-mailbox table structure:

This architecture, along with other changes to the ESE and store engines (lazy view updates, space hints, page size increase, b+ tree defrag, etc.), netted us not only a 70% reduction in IO over Exchange 2007, but also substantially increased our ability to store more items in critical path folders.

As a result of the new architecture and the other changes to the store and ESE, we had to deal with an unintended side effect.  While these changes greatly improved our IO efficiency, they made our space efficiency worse.  In fact, on average they increased the size of the Exchange database by about 20% over Exchange 2007. To overcome this bloating effect, we implemented a targeted compression mechanism (using either 7-bit or XPRESS, which is the Microsoft implementation of the LZ77 algorithm) that specifically compresses message headers and bodies that are either text or HTML-based (attachments are not compressed as typically they exist in their most compressed state already).  The result of this work is that we see database sizes on par with Exchange 2007.

The below graph shows a comparison of database sizes for Exchange 2007 and Exchange 2010 with different types of message data:

As you can see, Exchange 2007 databases that contained 100% Rich Text Format (RTF) content was our baseline goal when implementing database compression in Exchange 2010. What we found is that with a mix of messaging data (77% HTML, 15% RTF, 8% Text, with an average message size of 50KB) that our compression algorithms are on par with Exchange 2007 database sizes. In other words, we mitigated most of the bloat caused by the lack of SIS.

Is compression the answer to replacing single instancing all together? The answer to that question is that it really does depend. There are certain scenarios where SIS may be viable:

  • Environments that only send Rich-Text Format messages. The compression algorithms in Exchange 2010 do not compress RTF message blobs because they already exist in their most compressible form.
  • Sending large attachments to many users. For example, sending a large (30 MB+) attachment to 20 users.  Even if there were only 5 recipients out of the 20 on the same database, in Exchange 2003 that meant the 30MB attachment was stored once instead of 5 times on that database. In Exchange 2010, that attachment is stored 5 times (150 MB for that database) and isn't compressed. But depending on your storage architecture, the capacity to handle this should be there. Also, your email retention requirements will help here, by forcing the removal of the data after a certain period of time.
  • Business or organizational archives that are used to maintain immutable copies of messaging data benefit from single instancing because the system only has to keep one copy of the data, which is useful when you need to maintain that data indefinitely for compliance purposes.

If you go back through our guidance over the past 10 years, you will never find a single reference to using SIS around capacity planning.  We might mention it has an impact in terms of the database size, but that's it.  All of our guidance has always dictated designing the storage without SIS in mind.  And for those that are thinking about thin provisioning, SIS isn't a reason to do thin provisioning, nor is SIS a means to calculate your space requirements.  Thin provisioning requires an operational maturity that can react quickly to changes in the messaging environment , as well as, a deep understanding of the how the user population behaves and grows over time to sufficiently allocate the right amount of storage upfront.

In summary, Exchange 2010 changes the messaging landscape.  The architectural changes we have implemented enable the commoditization of email - providing very large mailboxes at a low cost.  Disk capacity is no longer a premium.   Disk space is cheap and IT shops can take advantage of larger, cheaper disks to reduce their overall cost.  With Exchange 2010 you can deploy a highly available system with a degree of storage efficiency without SIS at a fraction of the cost that was required with previous versions of Exchange.

So, there you have it. SIS is gone.

- Ross Smith IV 

Updated May 12, 2020
Version 4.0

36 Comments

  • One thing I noticed is you mentioned you can leverage cheap disk for your Exchange Mailbox Servers, however from my understanding using local disk or cheap disk typically involves using a HA scenario where you have at least three mailbox servers in a DAG.  This is not always the case for most clients.

    I also want to point out that archiving in Exchange 2010 currently goes to the same database where the users mailbox exist.  This is difficult for clients to understand since their goal is to place the production database on prime disk and move archived email to the inexpensive disk.  I have had a number of clients question this logic.  

    My question is will there ever be a way to have the Archiving mailbox on a seperate database which could reside on cheap disk and where the production databases could reside on SAN etc.  

    Otherwise, thanks for the great explaination.  Many clients love DAG, moving them to three servers in a DAG always seems to be a bit more difficult though.  
  • Nice article and I agree with Ross. The discussion with people "missing" SIS functionality tends to be fed by gutfeeling. Nobody can size this properly with so many variables in a living system with a dynamic population. The argument of recovery may be valid, but do not forget that the best practice regarding database sizes versus recovery times isn't related to type nor number of attachments. IMHO, you *may* have benefited from SIS in the past, but you *will* benefit from storage changes in 2010.

    I've jotted down my thoughts on SIS here (shameless self-plug)
    http://eightwone.wordpress.com/2009/11/25/exchange-2010-database-compression/
  • Is there an easy way to determine what the size of the database will be after a transition to 2010 from 200x?

    For planning / design it would be nice to know that.

  • This explaination will help our customers understand the reason the Exchange team removed SIS.  This should've been conveyed long before now.  While this is not typically a show stopper (Archiving is a different story), it helps us partners convey a single answer to mutual customers.
  • The second diagram doesn't appear to reflect the changes in 2010.  In fact, it looks like the same diagram as the first, only with a cleaner look.
  • I guess if it would have been possible atleast for attachment to be a single copy per database it would have saved lot of unnecessary space consumption.

    We have disk resource cheaper that doesn't mean we should not take care of it.

    Again increase in database size would happen and would be more difficult and time consuming in recovery scenarios.(I am aware about DAG feature Thanks to this feature which would be easy for recovery)

    online maintainance would take more time because of increased database size.