Top 10 Exchange Storage Myths
Published Mar 29 2010 09:00 AM 18.9K Views

Many of you have heard about the changes we've made in Exchange 2010 that combine to give you the ability to use less expensive storage and deploy large mailboxes. There have been many discussions around the choices you have around Exchange storage, and along the way we've heard some interesting questions. To add some clarity to the discussion, we've put together some resources to bust some of the most common myths and mis-perceptions that we have heard.

To help you better understand our thinking around large mailboxes, we've published the Large Mailbox Vision Whitepaper.

AND here are the top 10 Myths about Exchange storage that we've heard . Busted!

  • Myth #1: Exchange requires expensive, high-performing storage . I can't afford large mailboxes!
    Reality: Exchange 2010 enables you to implement large, low-cost mailboxes. It performs well on less expensive disks and supports a range of storage options. See the Large Mailbox Vision Whitepaper.
  • Myth #2: Exchange 2010 doesn't support storage area networks (SANs).
    Reality: Exchange 2010 doesn't support network-attached storage (NAS) (maybe the similar spelling confuses people ?), but it does support a large range of storage options including SAN and DAS. Depending on your high availability model, storage can be configured using RAID or RAID-less (JBOD) storage. Different customers will require different solutions based on their requirements, but everyone has the ability to deploy large mailboxes at low cost.
  • Myth #3: I already have a SAN (or I just bought one), so it doesn't make sense to implement DAS. By the way, my SAN can use those less expensive SATA disks too.
    Reality: This one is not really a myth, but it is often misunderstood. SAN deployment may make sense for customers as long as you are able to deploy large mailboxes at low cost. Remember that Exchange supports a range of storage options including SAN and DAS. If you are looking to take advantage of multiple independent copies of databases, then consider the full cost of your storage solution.
  • Myth #4: JBOD configurations are not practical because the re-seed process after a disk failure takes too long, and this generates too much operational overhead.
    Reality: Microsoft IT uses a JBOD configuration very successfully and it can be a very low cost solution. However, a level of operational maturity is required to manage the environment appropriately. There are a multitude of factors that can affect seeding throughput rates, and internally in our JBOD architecture we see between 35-70 GB/hour.
  • Myth #5: Large mailboxes perform badly with Outlook.
    Reality: Exchange 2010 supports up to 100,000 items per folder, up from 20,000 in Exchange 2007. In addition to this, Outlook 2007 SP1 Feb09 update, Outlook 2007 SP2 & Outlook 2010 provide good performance for Cached Exchange Mode for mailboxes up to 10 GB in size, and even larger (25GB) using faster disks like 7.2K drives or SSD. Larger Mailboxes? The Exchange 2010 store was improved to support very large mailboxes (100 GB+) in online mode and with OWA. You can also use the Exchange 2010 Personal Archive, an email archiving feature, to reduce mailbox size for Cached Exchange Mode clients.
  • Myth #6: When I migrate to Exchange 2010 my database size will explode because Exchange 2010 doesn't have single instance storage (SIS).
    Reality: Exchange storage planning guidance has always dictated designing the storage without SIS in mind. SIS reduces Exchange Server's ability to do sequential data access, and the changes made help to provide the 70% IO reduction. Exchange 2010 does provide 20% database compression for HTML/Plain Text Messages. For more details about Exchange 2010 and SIS, see previous post — Dude, Where's My Single Instance?.
  • Myth #7: My Exchange guy knows nothing about storage - it needs to be managed by the storage experts. Less expensive storage is too hard/time-consuming/expensive to manage.
    Reality: We know from the many organizations we have talked to who are using DAS (including Microsoft's own deployment), that they have not needed any additional people to manage less expensive exchange storage, nor have they increased their operational costs. When storage is expensive, you can spend a lot of time and resources optimizing for your storage investment. Using less expensive storage enables you to take a conservative approach and enables you to over-provision. The storage is then never touched except for firmware/driver updates or disk failures. You can use server management staff to manage the storage since the tasks are very similar (driver, firmware updates).
  • Myth #8: I can't backup large Exchange databases.
    Reality: With the ability to have multiple copies of each database, along with features such as single item recovery and lagged copy support, you might not need to use traditional backups. You can also look at reducing the number of backups to weekly or bi-monthly full backups, you can backup from passive database copies, and you can use DPM "express" backups to save space.
  • Myth #9: We need a 3rd party archiving solution because Exchange data needs expensive storage and we need to put archived data on less expensive storage.
    Reality: You can put all Exchange data on less expensive storage, not just the archive data. Co-locate hot and cold data to efficiently utilize large low cost disks and simplify management by using a single storage type.
  • Myth #10: All Exchange storage designs must follow the Exchange Mailbox Role Requirements Calculator verbatim, otherwise they will not be supported.
    Reality: The Exchange Mailbox Role Requirements Calculator (Exchange 2010 / Exchange 2007) provides design guidelines but does not have anything to do with supportability. The Exchange Solution Reviewed Program (ESRP) - Storage also has information from our storage partners.

Hope this clears up some of those myths!

-- Astrid McClean

28 Comments
Not applicable
While I agree that most of the myth have been busted, I would like the Exhange team to elaborate Myth number 4.
How does MSIT address reseed performance issues when using JBOD?

Cheers
Not applicable
CY -

In terms of MSIT's architecture (JBOD configuration that leverages 7.2K 1TB nearline SAS), we typically reseed from a passive copy, and thus, since the passive is doing very little work (compared with the active copy), we can achieve much higher throughput (around 70GB/hour).  So with a 500GB database file, with 35GB/hr you are looking at 14 hours, or 7 hours if you are hitting around 70GB/hour.

Since the architecture has a total of 4 copies, the loss of a single copy and the reseed time (7-14 hours) is not a concern as we have 3 other copies to rely on if another failure event occurs.  In fact, a reseed rate of 24 hours is not a concern due to the design.

Ross
Not applicable
Hi Astrid,

Much needed blog post and the Whitepaper. Really appreciate and it would be a good place to point our customers to. Let me get back to you with customer reactions [well do not expect miracles though ;-)]

Cheers
Nitin
Not applicable
Another useful post, keep up the good work!

I was wondering though, are there any IT Showcase/How Microsoft Does IT pieces in the pipeline relating to Exchange 2010?
Not applicable
Hi Ant,

Yes there are IT Showcase papers in the works.  Expect to hear more about them in upcoming months.

Ross
Not applicable
I want to retweet this article... ARGH!
Not applicable
Thank you Ross for the clarification. Sorry if this is obvious, but is there a configuration that will limit the reseed operation to a passive copy (assuming I have 3 copies of the DB) or is this a manual operation after I have replaced the failed harddisk?

Also, I noticed that MSIT is using SAS disk. We have been hearing from Microsoft that SATA disk is good enough.

Cheers
Not applicable
@cy: Automatic seeding only occurs when you create a new database. In failure situations, you'd have to seed manually using the Update-DatabaseCopy cmdlet (use -SourceServer parameter) or Update Database Copy wizard in EMC.

Not applicable
Hi,

'Microsoft IT uses a JBOD configuration very successfully and it can be a very low cost solution.'

Does MSIT use Exchange Native Data Protection or 'traditional' backup and why?

Kind regards, Martijn
Not applicable
Hi Martijn,

If you go back and look at why you need backups - you need them to protect against software and hardware failures, and possibly datacenter failures -> covered by mailbox resiliency.  You also need backups to recover accidentally or maliciously deleted email -> covered by single item recovery.  You may also need backups to protect against administrative error (e.g., mailbox deletions) or logical corruption -> in these cases you need a point-in-time backup which could be a lagged database copy.

MSIT utilizes the native data protection features of Exchange, namely, mailbox resiliency (4 HA copies) and single item recovery (30 days) for its backup and recovery infrastructure.

The reasoning is simply how we do business and the need to reduce costs in the infrastructure.  Back in Exchange 2003 we leverated a disk-to-disk-to-tape backup architecture and maintained the tape backups for 14 days.  In Exchange 2007 we moved away from tapes as we found they were a) costly and b) unreliable; instead we leveraged a disk-to-disk backup architecture and maintained those backups for 14 days.  The main reason for maintaining backups was single item recovery.  Thus, with Exchange 2010, we have single item recovery built-in and we have multiple copies.

Hope this helps,

Ross
Not applicable
CY- Yes, MSIT utilizes 7.2K nearline SAS.  There are a few reasons for this.

1.  MSIT designed and implemented its storage infrastructure for the Exchange 2010 rollout 1.5 years ago - before the product development cycle had completed (i.e., before all of the IO reduction work was completed and checked into the product).  Therefore, they chose a disk that could produce more IOPS compared to 7.2K SATA equivalent.  It should be noted that the 7.2K SAS shares low cost mechanics with 1TB 7.2K SATA counterpart, however, so there isn't a great cost differential here.
2.  SAS does provide a more robust transport which allows for resetting individual disks, as opposed to, resetting the bus.

Our Outlook Live service (22 million mailboxes) runs on 7.2K SATA.

So in summary, both 7.2K SAS and 7.2K SATA are viable choices.  You need to consider the mechanics, the storage chassis, the capacity, IO perforimance, and the annualized failure rates when deciding on which disk you should utilize in your infrastructure.

Ross
Not applicable
Great article. I don't know of another time when IT workshops are reporting that they won't need a larger budget to upgrade or implement a project. The ability to reduce storage costs is great.
Not applicable
Curious to see how a JBOD Exchange 2010 setup would test out with the JetStress Beta tool...
Not applicable
Hi Ross,

Thanks! So I assume that MSIT is using one lagged HA copy?

Martijn
Not applicable
No Martijn, MSIT is not leveraging lagged database copies.  Logical corruptions are not a concern (don't overwrite user's data and single item recovery does provide versioning control) and MSIT has a well defined process with trained people that are the administrators and thus administrative error isn't a concern either.



Ross

Not applicable
Also, I want to point one thing out - a lagged copy is not an HA copy it is a DR copy (point-in-time backup).  Yes it can be activated, but you do not want to activate your lagged copy because if you do activate it you lose your point-in-time capabilities.

Ross
Not applicable
So how you use your lagged copy ??
Not applicable
If you need to recover data from a lagged copy, you follow the steps outlined below which if done correctly, ensures you don't erase your point-in-time copy.

1.  Suspend replication to lagged copy.
2.  Take VSS snapshot (vssadmin) of lagged copy.
3.  Select logs for replay.
4.  Replay logs against lagged copy using ESEUTIL.
5.  Copy database and use for recovery (recovery database).
6.  Revert VSS snapshot.
7.  Resume replication.

Ross
Not applicable
Using SATA JBOD disks blows my mind to a point where I doubt I could sleep at night. Do you have a visio or such of how you would utilize this scenario for a mid sized company of say 500 mailboxes? I looked at the Excel download, but it isn't helping me visually get the concept.
Not applicable
Hey Seth,

In order to deploy your mailbox databases on JBOD, you need to have at least 3 database copies - which means at least 3 mailbox servers.  Assuming you are deploying 3 copies, it really comes down to two things - ensuring you do not exceed either the IO characteristics or capacity characteristics of the drive.  As an example, if your user profile is 150 messages per day, then you can expect your IO profile to be around .15 per mailbox.  With a 7.2k SATA disk you can expect to achieve 55 IOPS, so you are looking at a maximum of 300 mailboxes (some fluff factors added).  Now you look at your capacity and dial down the number of mailboxes there; so maybe for 2GB mailboxes you can still handle 300 mailboxes per disk.  So now you know that you need 2 databases to handle your 500 mailboxes, which means you'll have a total of 2 disks per server.

As for sleep loss - I think the key thing there is that you are building enough redundancy in teh system so you don't have to worry.  By having 3 or more copies, if you lose a single copy, you still have two other copies to fall back on - one that is the active copy, and another that is a passive copy and a seeding source for third copy when you replace the disk.

Now as to whether you want to deploy this architecture for your organization, will really depend on a multitude of factors, one of which will be cost.

Hope this helps,

Ross
Not applicable
I think your site is very interesting.
Not applicable
Russian version of this post/Русский перевод статьи здесь: http://www.maximumexchange.ru/2010/04/01/top-10-exchange-storage-myths/
Not applicable
And Microsoft licenses for the additional Exchange servers are free - ha. This is like saying... buy 4 cheaper cars so if the first one breaks down, you have 3 more to drive.
Not applicable
Great article.  However,  I downloaded the Large Mailbox whitepaper and not even two pages into I found so many grammatical and punctuation errors that it became difficult to consume the actual content.  I realize this may sound nit-pickly, but when I read things that are poorly edited it makes me question the accuracy of the actual content of a paper.
Not applicable
How does Microsoft mitigate the increased number of disks needed for a DAS solution?

The (potential) problem is highlighted here:
http://markarnold.blogspot.com/2009/08/exchange-2007-and-das-truth.html

and here:
http://markarnold.blogspot.com/2009/09/how-many-disks.html

While a SAN is not necessary, by the time TCO is factored in it seems like the two solutions, at worst, break even.
Not applicable
Andrew,

Those blog articles fail to mention a few key points (not to mention the byline of the blog is "rarely factual" which I personally found amusing):

1.  a single server vs. multiple servers = multiple servers with multiple isolated copies remove single points of failure.
2.  Storage fails and the number one cause?  It's not the technology, it's people.  You have to buy at least two storage devices to mitigate that (and you need a solution to replicate data).
3.  No cost points were provided.  Thus the claims are just an opinion.

So here's an example that I recently saw from one of my customers for their E2010 design:

100,000 mailboxes, all 100 message profile.  500MB quotas (though if they go with 1TB spindles obviously they could increase the mailbox sizes without significantly changing the storage numbers). The design was for 3 HA copies.

The SAN storage projected cost (includes SAN licensing and storage) with 1TB 7.2K SATA (RAID-10) was $10 million ($16 million if you went with RAID-5 as it required more disks).  This was for 2700 disks.  This was the discounted price I should add.

HP sells a 70 disk 5U storage chassis, the MDS-600.  Fully populated with 1TB SATA disks, that enclosure goes for $50,500 MSRP.  If you keep the RAID architecture it's still the same number disks, so that means you need 39 chassis. That's $2 million for the storage.

If you change the architecture and go without RAID for those 3 copies and deploy on JBOD you cut your storage requirements nearly in half which cuts your cost in half.  But heck, let's say to mitigate the move to JBOD you increase your copy count to 4, that's still less disks than the RAID/3copy solution (around 2200 disks) and only around $1.6 million.  

And that's only the capital costs.  Since both solutions are leveraging SATA, the power costs are roughly the same (the chassis/storage controllers add some additional cost there).  Floor costs may be different (depending on rack/space charges) and that obviously has to be factored into the equation (but does that really cost anywhere close to $8 million?).

Ultimately you need to evaluate all the options.  Does the model change with more copies and JBOD?  Yes.  It does mean failovers when disks fails, which may or may not be cool for you or your customers, but that should at least be weighed against the storage options and their associated costs.

For some of my customers they are considering different storage paradigms and more copies to achieve their business goals while reducing costs.  Others will leverage existing investments in storage they have.  But we should at least help them have an open mind so that they can see where there are potential cost savings.

John
Not applicable
I think John is spot on here with some great examples.  

AND, when we link to blogs, we need to keep in mind where the blogger works.  Mark Arnold does work for a major storage vendor, and of course storage vendors are going to tell you that JBOD is not a cheaper solution.

Also keep in mind that Microsoft doesn't make any more or less money based on your storage solution.  Almost all of their money comes from those exensive CALs.  So they are really trying to find the best cost solution for you so you will like Exchange more and be more inclined to stay with that rather than switch to a competing email solution.  If Exchange is better, faster and CHEAPER, then MSFT FTW.
Not applicable
Full disclosure: I work for a vendor providing storage solutions for Exchange.

This is a great article and interesting dialog.  Our customer discussions have led us to the conclusion that they are concerned that deploying Exchange 2010 at scale does require a tremendous increase in storage capacity, primarily due to the number of isolated copies behind each server (driven by DAG) and the need to increase mailbox quotas (to satisfy user's hunger for email capacity).  The over-the-top model with DAG is great from an HA perspective but does create a multiplicative effect on capacity required.  We feel that this creates a unique opportunity for customers that want to keep Exchange 2010 on-premises but want a more compelling means of addressing the storage capacity required to support it, and techniques such as hybrid storage solutions, primary storage deduplication, and so on, may provide an attractive option without compromising performance or data protection posture.  

If anyone is interested, I encourage you to check out what we've built for Exchange 2010 at: http://www.storsimple.com/microsoft-exchange/

Thanks,
Joel @StorSimpleInc  
Version history
Last update:
‎Jul 01 2019 03:50 PM
Updated by: