storage

131 Topics

Announcing the Jetstress 2013 Field Guide
Due to the success of the Jetstress 2010 field guide, we have decided to continue the tradition by releasing an updated version of the guide for Jetstress 2013 . As with the previous version, the aim of the document is as follows: Explain how Jetstress works. Provide storage validation planning and configuration guidance. Provide Jetstress results interpretation guidance So, what’s changed? Well, the good news is that Jetstress 2013 is very similar to Jetstress 2010. There are some modifications to accommodate the storage changes within Exchange Server 2013, however the planning, configuration and results interpretation process remain largely the same as they were in Jetstress 2010. Change overview in Jetstress 2013 The Event log is captured and logged to the test log. These events show up in the Jetstress UI as the test is progressing. Any errors are logged against the volume that they occurred. The final report shows the error counts per volume in a new sub-section. A single IO error anywhere will fail the test. In case of CRC errors (JET -1021), Jetstress will simulate the same behaviour as Exchange “page patching”. Detects -1018, -1019, -1021, -1022, -1119, hung IO, DbtimeTooNew, DbtimeTooOld. Threads, which generate IO, are now controlled at a global level. Instead of specifying Threads/DB, you now specify a global thread count, which works against all databases. Updates in the Jetstress 2013 Field Guide Not content with simply updating Jetstress, we have also added some more information into the field guide. Updated internals section to reflect changes made in Jetstress 2013 [4] Updated validation process flow charts [5.1] Improved failure mode testing section [5.4] Updated initialisation time table [5.6.1] Updated installation section [6] Updated report data section [9] Updated thread count section [Appendix A] The Jetstress Field Guide will be the only documentation released for Jetstress 2013, so if you have any feedback please feel free to share it with us here. You can download the new version of Jetstress field guide as an attachment of this post. Thanks, Neil Johnson Senior Consultant, MCS UK
The_Exchange_Team
Sep 13, 2024 Place Exchange Team Blog
17KViews
0likes
9Comments
How to test the disks on your Exchange server
If there's one thing that's true of all busy Exchange servers, it's that they generate massive amounts of disk I/O. There's a joke around here that Exchange is the world's biggest hard disk diagnostics program. Typically, your disks will be the first component of your Exchange server that starts groaning as you add load. And, frequently, you'll find that if you get your disks out of the redline area of the dial, that other performance issues suddenly heal themselves too. Why is this so? Exchange databases use transactional logging. As new data comes in, the most urgent priority is getting the new stuff secured on disk in a log file. If you are experiencing "log stalls," then everything else that needs to happen with that data must wait. This can lead to a cascade of other bottlenecks. There is a very good KB article on log stalls. The article tells you how to use System Monitor to tell if you have a log stall problem, and how to tune Exchange if necessary: XADM: Log Stalls/sec Are Regularly Greater than 0 (Zero) http://support.microsoft.com/?id=188676 But all the fine tuning in the world won't help if you are plain just demanding too much from your disk system. How can you tell what kind of load your disk system can really sustain? For years, the Exchange database test team here at Microsoft has used a homegrown tool called Jetstress to simulate heavy disk I/O loads. It can be downloaded from: http://www.microsoft.com/downloads/details.aspx?FamilyId=94B9810B-670E-433A-B5EF-B47054595E9C&displaylang=en You may have used LoadSim in the past. JetStress has some similarities, but is not a replacement for LoadSim. JetStress is a more sharply focused tool than LoadSim. It is intended only to simulate Exchange disk I/O activity. LoadSim lets you simulate network and client activity, and thus indirectly works out the disk system. JetStress goes right at the disk system, no indirection about it. You don't even have to have Exchange installed to use JetStress. You simply copy a few files to a server and start pounding it to its limits. JetStress generates a test database from scratch, of whatever size you want. Typically, to get valid results, you only need to generate a database that is 5% the size of your intended real database. You can then tell JetStress to make the same changes to the database that happen during normal operation. It adds, deletes, replaces and reads records from the database. By using System Monitor, you can see how much real Exchange load your disks can handle. You can change your disk configurations and re-run the same tests to see what kind of difference it makes. There are two basic kinds of testing you can do with JetStress: Performance Testing Disk Subsystem Stability Testing We usually recommend that you let JetStress run at least 2 hours when you're testing to see what kind of sustained throughput your disk system can handle. If you're doing stability testing, the recommendation is 24 hours. Now, what exactly do I mean by stability testing? Exchange can subject your server to very complex random I/O. As you push computer systems closer and closer to their tested limits, and as you run huge amounts of data through the system, you're more likely to encounter glitches and even bugs in the ability of the system to reliably process and preserve data. JetStress will let you load your system up till it's running as fast as it can, and will keep it under stress to see if it remains reliable in both storing and retrieving data. The way you can tell if the system is performing reliably is to look for error -1018 from your Exchange database. This error occurs whenever a page is read from the database, and the checksum on the page is wrong. Every page in an Exchange database is checksummed as it is written, and the checksum is verified every time the page is looked at again. If even a single bit is wrong on the page, Exchange declares the page bad and reports a -1018 error. You can learn more about Exchange page checksums and how we detect corruption in the database in this KB article: XADM: Understanding and Analyzing -1018, -1019 and -1022 Exchange Database Errors http://support.microsoft.com/?id=314917 If your database has -1018 pages after the stress test, then the disk system cannot be considered reliable at the load level under which it was tested. When you download the JetStress utility, you get excellent documentation along with it. The documentation will walk you through every phase, from setting up tests and monitoring their progress, to interpreting their results. It even tells you which System Monitor (Performance Monitor) counters to look at, and what values are OK. It also tells you how to validate the integrity of the database after a stability test. Mike Lee
The_Exchange_Team
Sep 05, 2024 Place Exchange Team Blog
2.7KViews
0likes
7Comments
Putting the Restrictions on Restrictions
One problem we encounter almost on a weekly basis is general performance problems. Most performance problems we have seen lately have been related to either inadequate disk configuration or MAPI restrictions. Mike Lee has already explained the bit about Disk issues in his entry, so I am going to tackle the other issue of MAPI restrictions. Let’s start with the basics. MAPI represents data typically in the form of a table. Everything is a table; there is a table that is presented to the client when it requests a list of providers, a table for folders, folder contents, attachments, etc. Each table is comprised of columns. Each column is a different MAPI property representing things like the sender, subject, and delivery time. Each row represents an individual item, so for a folder contents table, each row would represent messages. Now the client can do some interesting things with the tables, like resort them. The client can seek in the table to a specific row that matches criteria. This operation is referred to as a FindRow(). It can also request that only items fitting a certain criteria be included in the table. An example would be to only include items created on a specific day. This is what is known as a restriction. The resulting folder contents table would only have items in the table that meet the given criteria. Restrictions are used when it is expected the client will be requesting the same representation of data on a frequent basis. So now you maybe asking, “Why is that so bad?” Well to understand it we need to take a look at how the Store actually stores the data a MAPI client requests and how it interprets requests such as FindRow and Restrict. Inside the storage schema of the store we have various tables that collectively represent things such as mailboxes, folders, folder contents, and messages. This is what allows the store to do things like Single Instance Storage. When a client requests a list of the contents of a folder, that request is mapped to a special folder referred to as a MessageFolder Table (MsgFolder for short). Each folder created in the system has a separate message folder table. The purpose of the MsgFolder table is to map a folder to its contents. Keep in mind that for Single Instance Storage we have to allow multiple folders to keep a reference to a single message and multiple messages may appear in any given folders. In the database terminology this is known as a many-to-many relationship which requires an intermediate mapping table, the MsgFolder table (one for each folder). So to accommodate the client request to only be presented with messages from a specific date range, we could get a reference to the Folder’s MsgFolder table, then selectively remove unwanted rows by traversing the table in memory, but that would be very expensive operation. To handle the expectations of a Restrict call (frequent re-request for the same data) we create a new special folder (and corresponding MsgFolder table) that is referred to as a Restricted Search Folder. This folder is linked back to the original folder and logical relationship exists between these two folders. We place a condition on the search folder such that it should only include items that meet the criteria specified by the restriction. In this search folder, a backlink to the original row in the MsgFolder table exists for each message in the MsgFolder table that meets the criteria of the restriction. The performance issues that is encountered is the time it takes to manage the update to each of the search folders. When a change takes place on the original folder, the change is compared to each of the restricted search folders associated with the folder in question to determine if they need to be updated as well. This has a bigger impact when a lot of search folders exists for one or many folders. The second issue encountered is the creation of the restricted search folders. The creation of restricted search folder requires a full pass of the original folder to extract individual items that need to be linked into the restricted search folder. If this process occurs as a direct result of an action of the client, the time make experience a hang or receive the Outlook popup box indicating a request is taking a long time process. The time it takes to create the restricted search folder is proportional to the number of items in the regular folder. In Exchange 5.5 we did not put a limit on the number of the restricted search folders that we would allow, but the default timeout was 8 days. If the restricted search folder was not used in 8 days, it would be removed as a part of background store maintenance. The led to performance problems as lots of restricted search folders could be created all of which would need to be updated every time the original folder had any add/delete/changes/moves. In Exchange 2000/2003 we put a cap on the maximum number (11) of restricted search folder we would allow on a per folder basis, however the default lifetime was increased from 8 to 40 days. This has proven to be adequate for most, but from time to time we see the opposite problem occur. If a folder already has 11 restricted search folders associated with it and a new restrict request is made, the list of search folders is FIFO based using the last time the restriction was actually used. So this means the stalest restricted search folder is removed to make room for the new request. As mentioned above, this requires a full pass of the regular MsgFolder table and if done on the “Clients time” the client may perceive a performance issue while the table is being build for the client. So it is possible that on a daily or weekly basis more than 12 restricted search folders are used/created while the limit is 11. Eventually the client hits a restriction request that currently doesn’t have a matching search folder and results in the delete and creation of new restricted search folder on the client’s time. So how do you identify this situation and what can be done about them? Increase Diagnostic Level for MSExchangeIS Private Views or MSExchangeIS Public Views to minimum and look for events similar to the following to determine the frequency of restrictions being created for particular folders or users. Event ID: 1167 Source:MSExchangeIS Private / Public Type:Information Category:Views Description: created a new restricted view on folder . Observe the perfmon counter MSExchangeIS Private –or- Public \ Categorization Count. The is an overall number of restricted search folders + regular search folders in the system. Watch for sharp increases, especially after implementing any 3 rd party application that takes advantage of MAPI interfaces. In Exchange 5.5, dump the Private or Public information store using ISINTEG. This isn’t as productive in Exchange 2000/2003 since we limited the maximum number of restricted search folders. However you could use the information to correlate the event ids above to the folders possibly being affected. isinteg -pri|pub -dump -l logfilename Examine the log file and look for any folders with large numbers of entries under the following fields: Search FIDs= Recursive FIDs= Search Backlinks= Categ FIDs= For example: Search FIDs=0001-000000000418,0001-00000000041B,0001-000000000421, 0001-000000000423,0001-000000000424,0001-000000000428,0001-00000000042D Also take a look at the following KB articles: http://support.microsoft.com/default.aspx?scid=kb;en-us;216076 http://support.microsoft.com/default.aspx?scid=kb;en-us;328355 Hope this was helpful! Jeremy Kelly
The_Exchange_Team
Apr 26, 2024 Place Exchange Team Blog
2.3KViews
1like
1Comment
How the M: Drive came about
In Exchange 2000, we introduced a new feature called IFS. IFS stands for “Installable File System”. This uses a little known and even less used feature of NT that allows the OS’s file system (like NTFS or FAT) to be replaced. The initial reason for doing that was as an optimization: it would allow protocols, such as NNTP and SMTP, to transfer the MIME messages directly as files. In Exchange 5.5, MIME messages are broken down into MAPI properties and stored in database tables. When they need to be accessed as MIME, they are put back together. In E2K, MIME messages are stored as MIME files in IFS and only converted into MAPI if a MAPI client (such as Outlook) accesses them. The other perceived benefit of IFS was that the Exchange storage objects could then be made visible through the file system. So you could go to a drive letter (M: was chosen for two reasons: first, “M” for “Mail”, and second, because it was in the middle of the alphabet and least likely to collide either with actual storage drives -which start at A and move up - or mapped network drives - which start at Z and move down), and get a list of mailboxes, navigate to mail folders via cmd or windows explorer and look at actual messages. This was considered pretty neat at the time and since it didn’t seem to be much more work to allow that access, it was thrown in (there may have been other, better reasons but I’m not aware of them). This ended up causing some challenges down the line related to the intricacies in how email objects need to be handled and mapping the file access behavior to them One of the biggest problems encountered was around security descriptors. This is difficult to explain without a detailed understanding of NT security descriptors, so I will simplify the explanation for the purpose of this discussion. The main part of an NTSD is called a DACL (discretionary access control list). It contains a list of users and groups and what they can do to that object. There are two main types of entries: allows, which say what an entity can do; and denies, which say what they can’t. The order of this list is very important. A standard sequence of entry types is called “canonical”. NT canonical form calls for a particular sequence. Because of legacy issues, MAPI canonical form requires a difference sequence of entry types. Applications that modify security expect a particular sequence and will behave erratically if the sequence is wrong. By creating or modifying objects through the M: drive, it will change the canonical format of the DACL’s and result in unexpected security behavior. This is bad. A related issue here has to do with item level security. E2K also introduced this feature, which is that items in a folder can be secured independently of each other and the folder. While this has some great uses, for many email systems this level of security is not needed. When a message has the folder default security, it simply references that property in the folder. When a message has its own security, there is an additional property that needs to be stored (this also has an affect on how folder aggregate properties, such as unread count, are computed). Having lots of individual security descriptors can result in both increased storage size and poor performance. When a message is created or modified through the M: drive, it will always get an individual security descriptor stamped on it, even if it is exactly the same as the folder default. This can also lead to unexpected behavior. For instance, if you change the default security on the folder, it will not change the security on any messages in it that have their own security descriptors. They have to be resecured individually. Another challenge is in relation to virus scanners. Virus scanners typically look for valid storage drives and spin through all the files on those drives and check them against virus signatures. The M: drive appears as a normal drive, so virus scanners were picking this up and processing it. This can have very detrimental affects on system performance and may also result in message corruption in some cases. Finally, IFS runs in kernel mode. This is a privileged execution path and it means that problems in this area can have much more severe affects (and be harder to track down) than in other areas of Exchange, which all run in user mode. Blue screens are one possibility if something goes wrong. IFS has given Exchange 2000 and Exchange 2003 a lot of advantages: we maintain content parity for MIME and make MIME message handling faster and more efficient as well as increasing the performance of such messages retrieved via internet protocols. But as I described above, there can be problems if IFS is misused via the M: drive. In Exchange 2003 we have disabled the M: drive by default to hopefully help reduce the likelihood that customers will encounter any of the issues described above. I encourage every system administrator to keep this disabled on E2K3 and disable it on all E2K servers as well. Jon Avner
The_Exchange_Team
May 17, 2023 Place Exchange Team Blog
4KViews
1like
1Comment
The history of content conversion in Exchange
The Exchange server and the client (that evolved into Outlook) were originally (circa 1992) based on the MAPI standard (stands for Messaging Application Programming Interface). Broadly this can be divided into the MAPI data model, and the MAPI object schema. Very simplistically, the MAPI data model can be summed as: message stores (think mailboxes) that contain folders that contain items that contain attachments. And all of the above entities are simply a collection of scalar properties (name, value pairs). The original MAPI schema laid out the common list of attributes applicable to email messages, for eg: Subject, Received Time etc. This has since been extensively extended by Outlook and Exchange for various advanced functionality. The Exchange server store was implemented to store the above data model in a very efficient manner, using a database technology called internally as Jet Blue. This data model, although simplistic in this day and age, has proved extremely flexible and surprisingly powerful to model many things that the original inventors could never have even imagined. And the Exchange server, due to its excellent implementation of this data model, has reaped great rewards. The primary mail transport protocol used for business e-mail back then was X.400. At about 1995 or so, the internet wave came along. The Web, the Browser, and HTML changed the world. Along with them the popularity of other standards based protocols for email increased, such as SMTP for mail exchange and MIME as the serialization format for email messages. Both of these rocked the ship of Exchange back then. The Exchange product had already been in development since '92 and more delays to complete it and ship it would have pretty much meant disbanding it altogether. The pressure from the executive level to ship it was intense. But folks in Exchange back then were wise enough to see that this internet wave was too big to just be a passing fancy and that these new standards must be embraced. As a compromise, support for SMTP as a mail exchange protocol (instead of just X.400) and MIME as a data interchange format (instead of just MAPI) were added to the product, but on the periphery so that the core product won't be affected much. Fortunately, the existing transport architecture had supported the notion of Gateways to connect to disparate other mail systems that existed then (again circa '92) who were foolish enough to not support X.400 natively! For example to connect with ccMail, GroupWise and the likes. We decided to lump this new kid on the block (SMTP) in the same category and modelled it simply as a gateway add-on to connect to this thing called the "internet"! This was called the Internet Mail Connector or simply the IMC. This IMC also then got the fun task of back and forth conversion between the data format stored native in the Exchange store (MAPI) and the data format that was the new internet standard (MIME). A conversion library was created for this purpose and was named, somewhat oddly now, as IMAIL (Internet Mail). This was the how we shipped Exchange 4.0 (March 13 1996). The next release, 5.0, was a very focused one in which we made very minor changes, although we did also add POP support. Both the IMC and IMAIL stayed pretty much the same as they were in 4.0. In the next release, 5.5, we decided to add support for IMAP4 as well. Now, POP and IMAP were similar to SMTP in insisting on MIME as the data interchange format. So it was decided to move this IMAIL conversion library closer to the actual data store (that was still a MAPI store of course) to improve performance. In addition, we invested significantly in this library to do a top notch job of translating to and from these 2 formats, and this has been quite a success overall. Exchange 5.5 was a very successful release thanks to the right balance of functionality and simplicity. The next release was Exchange 2000. By this time the domination of the internet protocols was so complete that we was decided to swap the roles of SMTP and X.400. That is, to make SMTP our primary mail exchange protocol even between Exchange servers, and treat X.400 as a one-off Gateway to reach pockets of the world that were still using it (mostly Europe). Also, we decided to go even further along the road of embracing MIME as the data format. We decided to actually enhance the core Exchange store to make it natively store and "understand" MIME. In this release we also invested in a new file system technology that would let us store these large "streams" of MIME content more efficiently. See Jon Avner's blog on the M: drive for more details on this one. In order to satisfy our richest client (Outlook) that continued to demand a MAPI data model, we had to engineer the Exchange store to do on-demand or deferred conversions back and forth between the 2 formats. As one can imagine, this was a very challenging piece of engineering, fraught with subtle traps. In the Exchange 2003 release we did not make any significant changes in this area, and status quo from the 2000 release remains. Naresh Sundararajan
The_Exchange_Team
Feb 16, 2023 Place Exchange Team Blog
5KViews
0likes
5Comments
Released: Exchange Server Role Requirements Calculator 8.3
Today, we released an updated version of the Exchange Server Role Requirements Calculator. This release focuses around two specific enhancements. Exchange 2016 designs now take into account the CU3 improvement that reduces the bandwidth required between active and passive HA copies as the local search instance can read data from its local database copy. The calculator now supports the ability to automatically calculate the number of DAGs and the corresponding number of Mailbox servers that should be deployed to support the defined requirements. This process takes into account memory, CPU cores, and disk configuration when determining the optimal configuration, ensuring that recommended thresholds are not exceeded. As a result of this change, you will find that the Input tab has been rearranged. Specifically, the DAG variables have been moved to the end of the worksheet to ensure that you have completely entered all information before attempting an automatic calculation. As with everything else in the calculator, you can turn the automatic calculation off and manually select the number of Mailbox servers and DAGs you would like to deploy. For all the other improvements and bug fixes, please review the readme or download the update. As always we welcome feedback and please report any issues you may encounter while using the calculator by emailing strgcalc AT microsoft DOT com. Ross Smith IV Principal Program Manager Office 365 Customer Experience
Ross Smith IV
Jan 05, 2023 Place Exchange Team Blog
30KViews
0likes
14Comments
Troubleshooting Rapid Growth in Databases and Transaction Log Files in Exchange Server 2007 and 2010
A few years back, a very detailed blog post was released on Troubleshooting Exchange 2007 Store Log/Database growth issues. We wanted to revisit this topic with Exchange 2010 in mind. While the troubleshooting steps needed are virtually the same, we thought it would be useful to condense the steps a bit, make a few updates and provide links to a few newer KB articles. The below list of steps is a walkthrough of an approach that would likely be used when calling Microsoft Support for assistance with this issue. It also provides some insight as to what we are looking for and why. It is not a complete list of every possible troubleshooting step, as some causes are simply not seen quite as much as others. Another thing to note is that the steps are commonly used when we are seeing “rapid” growth, or unexpected growth in the database file on disk, or the amount of transaction logs getting generated. An example of this is when an Administrator notes a transaction log file drive is close to running out of space, but had several GB free the day before. When looking through historical records kept, the Administrator notes that approx. 2 to 3 GBs of logs have been backed up daily for several months, but we are currently generating 2 to 3 GBs of logs per hour. This is obviously a red flag for the log creation rate. Same principle applies with the database in scenarios where the rapid log growth is associated to new content creation. In other cases, the database size or transaction log file quantity may increase, but signal other indicators of things going on with the server. For example, if backups have been failing for a few days and the log files are not getting purged, the log file disk will start to fill up and appear to have more logs than usual. In this example, the cause wouldn’t necessarily be rapid log growth, but an indicator that the backups which are responsible for purging the logs are failing and must be resolved. Another example is with the database, where retention settings have been modified or online maintenance has not been completing, therefore, the database will begin to grow on disk and eat up free space. These scenarios and a few others are also discussed in the “Proactive monitoring and mitigation efforts” section of the previously published blog. It should be noted that in some cases, you may run into a scenario where the database size is expanding rapidly, but you do not experience log growth at a rapid rate. (As with new content creation in rapid log growth, we would expect the database to grow at a rapid rate with the transaction logs.) This is often referred to as database “bloat” or database “space leak”. The steps to troubleshoot this specific issue can be a little more invasive as you can see in some analysis steps listed here (taking databases offline, various kinds of dumps, etc.), and it may be better to utilize support for assistance if a reason for the growth cannot be found. Once you have established that the rate of growth for the database and transaction log files is abnormal, we would begin troubleshooting the issue by doing the following steps. Note that in some cases the steps can be done out of order, but the below provides general suggested guidance based on our experiences in support. Step 1 Use Exchange User Monitor (Exmon) server side to determine if a specific user is causing the log growth problems. Sort on CPU (%) and look at the top 5 users that are consuming the most amount of CPU inside the Store process. Check the Log Bytes column to verify for this log growth for a potential user. If that does not show a possible user, sort on the Log Bytes column to look for any possible users that could be attributing to the log growth If it appears that the user in Exmon is a ?, then this is representative of a HUB/Transport related problem generating the logs. Query the message tracking logs using the Message Tracking Log tool in the Exchange Management Consoles Toolbox to check for any large messages that might be running through the system. See #15 for a PowerShell script to accomplish the same task. Step 2 With Exchange 2007 Service Pack 2 Rollup Update 2 and higher, you can use KB972705 to troubleshoot abnormal database or log growth by adding the described registry values. The registry values will monitor RPC activity and log an event if the thresholds are exceeded, with details about the event and the user that caused it. (These registry values are not currently available in Exchange Server 2010) Check for any excessive ExCDO warning events related to appointments in the application log on the server. (Examples are 8230 or 8264 events). If recurrence meeting events are found, then try to regenerate calendar data server side via a process called POOF. See http://blogs.msdn.com/stephen_griffin/archive/2007/02/21/poof-your-calender-really.aspx for more information on what this is. Event Type: Warning Event Source: EXCDO Event Category: General Event ID: 8230 Description: An inconsistency was detected in username@domain.com: /Calendar/<calendar item> .EML. The calendar is being repaired. If other errors occur with this calendar, please view the calendar using Microsoft Outlook Web Access. If a problem persists, please recreate the calendar or the containing mailbox. Event Type: Warning Event ID : 8264 Category : General Source : EXCDO Type : Warning Message : The recurring appointment expansion in mailbox <someone's address> has taken too long. The free/busy information for this calendar may be inaccurate. This may be the result of many very old recurring appointments. To correct this, please remove them or change their start date to a more recent date. Important: If 8230 events are consistently seen on an Exchange server, have the user delete/recreate that appointment to remove any corruption Step 3 Collect and parse the IIS log files from the CAS servers used by the affected Mailbox Server. You can use Log Parser Studio to easily parse IIS log files. In here, you can look for repeated user account sync attempts and suspicious activity. For example, a user with an abnormally high number of sync attempts and errors would be a red flag. If a user is found and suspected to be a cause for the growth, you can follow the suggestions given in steps 5 and 6. Once Log Parser Studio is launched, you will see convenient tabs to search per protocol: Some example queries for this issue would be: Step 4 If a suspected user is found via Exmon, the event logs, KB972705, or parsing the IIS log files, then do one of the following: Disable MAPI access to the users mailbox using the following steps (Recommended): Run Set-Casmailbox –Identity <Username> –MapiEnabled $False Move the mailbox to another Mailbox Store. Note: This is necessary to disconnect the user from the store due to the Store Mailbox and DSAccess caches. Otherwise you could potentially be waiting for over 2 hours and 15 minutes for this setting to take effect. Moving the mailbox effectively kills the users MAPI session to the server and after the move, the users access to the store via a MAPI enabled client will be disabled. Disable the users AD account temporarily Kill their TCP connection with TCPView Call the client to have them close Outlook or turn of their mobile device in the condition state for immediate relief. Step 5 If closing the client/devices, or killing their sessions seems to stop the log growth issue, then we need to do the following to see if this is OST or Outlook profile related: Have the user launch Outlook while holding down the control key which will prompt if you would like to run Outlook in safe mode. If launching Outlook in safe mode resolves the log growth issue, then concentrate on what add-ins could be attributing to this problem. For a mobile device, consider a full resync or a new sync profile. Also check for any messages in the drafts folder or outbox on the device. A corrupted meeting or calendar entry is commonly found to be causing the issue with the device as well. If you can gain access to the users machine, then do one of the following: 1. Launch Outlook to confirm the log file growth issue on the server. 2. If log growth is confirmed, do one of the following: Check users Outbox for any messages. If user is running in Cached mode, set the Outlook client to Work Offline. Doing this will help stop the message being sent in the outbox and sometimes causes the message to NDR. If user is running in Online Mode, then try moving the message to another folder to prevent Outlook or the HUB server from processing the message. After each one of the steps above, check the Exchange server to see if log growth has ceased Call Microsoft Product Support to enable debug logging of the Outlook client to determine possible root cause. 3. Follow the Running Process Explorer instructions in the below article to dump out dlls that are running within the Outlook Process. Name the file username.txt. This helps check for any 3rd party Outlook Add-ins that may be causing the excessive log growth. 970920 Using Process Explorer to List dlls Running Under the Outlook.exe Process http://support.microsoft.com/kb/970920 4. Check the Sync Issues folder for any errors that might be occurring Let’s attempt to narrow this down further to see if the problem is truly in the OST or something possibly Outlook Profile related: Run ScanPST against the users OST file to check for possible corruption. With the Outlook client shut down, rename the users OST file to something else and then launch Outlook to recreate a new OST file. If the problem does not occur, we know the problem is within the OST itself. If renaming the OST causes the problem to recur again, then recreate the users profile to see if this might be profile related. Step 6 Ask Questions: Is the user using any type of devices besides a mobile device? Question the end user if at all possible to understand what they might have been doing at the time the problem started occurring. It’s possible that a user imported a lot of data from a PST file which could cause log growth server side or there was some other erratic behavior that they were seeing based on a user action. Step 7 Check to ensure File Level Antivirus exclusions are set correctly for both files and processes per http://technet.microsoft.com/en-us/library/bb332342(v=exchg.141).aspx Step 8 If Exmon and the above methods do not provide the data that is necessary to get root cause, then collect a portion of Store transaction log files (100 would be a good start) during the problem period and parse them following the directions in http://blogs.msdn.com/scottos/archive/2007/11/07/remix-using-powershell-to-parse-ese-transaction-logs.aspx to look for possible patterns such as high pattern counts for IPM.Appointment. This will give you a high level overview if something is looping or a high rate of messages being sent. Note: This tool may or may not provide any benefit depending on the data that is stored in the log files, but sometimes will show data that is MIME encoded that will help with your investigation Step 9 If nothing is found by parsing the transaction log files, we can check for a rogue, corrupted, and large message in transit: 1. Check current queues against all HUB Transport Servers for stuck or queued messages: get-exchangeserver | where {$_.IsHubTransportServer -eq "true"} | Get-Queue | where {$_.Deliverytype –eq “MapiDelivery”} | Select-Object Identity, NextHopDomain, Status, MessageCount | export-csv HubQueues.csv Review queues for any that are in retry or have a lot of messages queued: Export out message sizes in MB in all Hub Transport queues to see if any large messages are being sent through the queues: get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-messa ge –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,@{Name="Message Size MB";expression={$_.size.toMB()}} | sort-object -property size –descending | export-csv HubMessages.csv Export out message sizes in Bytes in all Hub Transport queues: get-exchangeserver | where {$_.ishubtransportserver -eq "true"} | get-message –resultsize unlimited | Select-Object Identity,Subject,status,LastError,RetryCount,queue,size | sort-object -property size –descending | export-csv HubMessages.csv 2. Check Users Outbox for any large, looping, or stranded messages that might be affecting overall Log Growth. get-mailbox -ResultSize Unlimited| Get-MailboxFolderStatistics -folderscope Outbox | Sort-Object Foldersize -Descending | select-object identity,name,foldertype,itemsinfolder,@{Name="FolderSize MB";expression={$_.folderSize.toMB()}} | export-csv OutboxItems.csv Note: This does not get information for users that are running in cached mode. Step 10 Utilize the MSExchangeIS Client\Jet Log Record Bytes/sec and MSExchangeIS Client\RPC Operations/sec Perfmon counters to see if there is a particular client protocol that may be generating excessive logs. If a particular protocol mechanism if found to be higher than other protocols for a sustained period of time, then possibly shut down the service hosting the protocol. For example, if Exchange Outlook Web Access is the protocol generating potential log growth, then stopping the World Wide Web Service (W3SVC) to confirm that log growth stops. If log growth stops, then collecting IIS logs from the CAS/MBX Exchange servers involved will help provide insight in to what action the user was performing that was causing this occur. Step 11 Run the following command from the Management shell to export out current user operation rates: To export to CSV File: get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount, progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| export-csv LogonStats.csv To view realtime data: get-logonstatistics |select-object username,Windows2000account,identity,messagingoperationcount,otheroperationcount, progressoperationcount,streamoperationcount,tableoperationcount,totaloperationcount | where {$_.totaloperationcount -gt 1000} | sort-object totaloperationcount -descending| ft Key things to look for: In the below example, the Administrator account was storming the testuser account with email. You will notice that there are 2 users that are active here, one is the Administrator submitting all of the messages and then you will notice that the Windows2000Account references a HUB server referencing an Identity of testuser. The HUB server also has *no* UserName either, so that is a giveaway right there. This can give you a better understanding of what parties are involved in these high rates of operations UserName : Administrator Windows2000Account : DOMAIN\Administrator Identity : /o=First Organization/ou=First Administrative Group/cn=Recipients/cn=Administrator MessagingOperationCount : 1724 OtherOperationCount : 384 ProgressOperationCount : 0 StreamOperationCount : 0 TableOperationCount : 576 TotalOperationCount : 2684 UserName : Windows2000Account : DOMAIN\E12-HUB$ Identity : /o= First Organization/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=testuser MessagingOperationCount : 630 OtherOperationCount : 361 ProgressOperationCount : 0 StreamOperationCount : 0 TableOperationCount : 0 TotalOperationCount : 1091 Step 12 Enable Perfmon/Perfwiz logging on the server. Collect data through the problem times and then review for any irregular activities. You can reference Perfwiz for Exchange 2007/2010 data collection here http://blogs.technet.com/b/mikelag/archive/2010/07/09/exchange-2007-2010-performance-data-collection-script.aspx Step 13 Run ExTRA (Exchange Troubleshooting Assistant) via the Toolbox in the Exchange Management Console to look for any possible Functions (via FCL Logging) that may be consuming Excessive times within the store process. This needs to be launched during the problem period. http://blogs.technet.com/mikelag/archive/2008/08/21/using-extra-to-find-long-running-transactions-inside-store.aspx shows how to use FCL logging only, but it would be best to include Perfmon, Exmon, and FCL logging via this tool to capture the most amount of data. The steps shown are valid for Exchange 2007 & Exchange 2010. Step 14 Export out Message tracking log data from affected MBX server. Method 1 Download the ExLogGrowthCollector script (attached to this blog post) and place it on the MBX server that experienced the issue. Run ExLogGrowthCollector.ps1 from the Exchange Management Shell. Enter in the MBX server name that you would like to trace, the Start and End times and click on the Collect Logs button. Note: What this script does is to export out all mail traffic to/from the specified mailbox server across all HUB servers between the times specified. This helps provide insight in to any large or looping messages that might have been sent that could have caused the log growth issue. Method 2 Copy/Paste the following data in to notepad, save as msgtrackexport.ps1 and then run this on the affected Mailbox Server. Open in Excel for review. This is similar to the GUI version, but requires manual editing to get it to work. #Export Tracking Log data from affected server specifying Start/End Times Write-host "Script to export out Mailbox Tracking Log Information" Write-Host "#####################################################" Write-Host $server = Read-Host "Enter Mailbox server Name" $start = Read-host "Enter start date and time in the format of MM/DD/YYYY hh:mmAM" $end = Read-host "Enter send date and time in the format of MM/DD/YYYY hh:mmPM" $fqdn = $(get-exchangeserver $server).fqdn Write-Host "Writing data out to csv file..... " Get-ExchangeServer | where {$_.IsHubTransportSer ver -eq "True" -or $_.name -eq "$server"} | Get-MessageTrackingLog -ResultSize Unlimited -Start $start -End $end | where {$_.ServerHostname -eq $server -or $_.clienthostname -eq $server -or $_.clienthostname -eq $fqdn} | sort-object totalbytes -Descending | export-csv MsgTrack.csv -NoType Write-Host "Completed!! You can now open the MsgTrack.csv file in Excel for review" Method 3 You can also use the Process Tracking Log Tool at http://blogs.technet.com/b/exchange/archive/2011/10/21/updated-process-tracking-log-ptl-tool-for-use-with-exchange-2007-and-exchange-2010.aspx to provide some very useful reports. Step 15 Save off a copy of the application/system logs from the affected server and review them for any events that could attribute to this problem. Step 16 Enable IIS extended logging for CAS and MB server roles to add the sc-bytes and cs-bytes fields to track large messages being sent via IIS protocols and to also track usage patterns (Additional Details). Step 17 Get a process dump the store process during the time of the log growth. (Use this as a last measure once all prior activities have been exhausted and prior to calling Microsoft for assistance. These issues are sometimes intermittent, and the quicker you can obtain any data from the server, the better as this will help provide Microsoft with information on what the underlying cause might be.) Download the latest version of Procdump from http://technet.microsoft.com/en-us/sysinternals/dd996900.aspx and extract it to a directory on the Exchange server Open the command prompt and change in to the directory which procdump was extracted in the previous step. Type procdump -mp -s 120 -n 2 store.exe d:\DebugData This will dump the data to D:\DebugData. Change this to whatever directory has enough space to dump the entire store.exe process twice. Check Task Manager for the store.exe process and how much memory it is currently consuming for a rough estimate of the amount of space that is needed to dump the entire store dump process. Important: If procdump is being run against a store that is on a clustered server, then you need to make sure that you set the Exchange Information Store resource to not affect the group. If the entire store dump cannot be written out in 300 seconds, the cluster service will kill the store service ruining any chances of collecting the appropriate data on the server. Open a case with Microsoft Product Support Services to get this data looked at. Most current related KB articles 2814847 - Rapid growth in transaction logs, CPU use, and memory consumption in Exchange Server 2010 when a user syncs a mailbox by using an iOS 6.1 or 6.1.1-based device 2621266 - An Exchange Server 2010 database store grows unexpectedly large 996191 - Troubleshooting Fast Growing Transaction Logs on Microsoft Exchange 2000 Server and Exchange Server 2003 Kevin Carker (based on a blog post written by Mike Lagase)
The_Exchange_Team
Jul 18, 2021 Place Exchange Team Blog
167KViews
0likes
5Comments
Exchange 2013 Calculator Updates
Today, we released an updated version of the Exchange 2013 Server Role Requirements Calculator. In addition to numerous bug fixes, this version includes new functionality: CPU utilization table, ReplayLagManager support, MaximumPreferredActiveDatabases support, Restore-DatabaseAvailabilityGroup scenario support, and guidance on sizing recommendations. You can view what changes have been made or downloadthe update directly. For details on the new features, read on. CPU Utilization Table The Role Requirements tab includes a table that outlines the expected theoretical CPU utilization for various modes: Normal Run Time (where the active copies are distributed according to ActivationPreference=1) Single Server Failure (redistribution of active copies based on a single server failure event) Double Server Failure (redistribution of active copies based on a double server failure event) Site Failure (datacenter activation) Worst Failure Mode (in some cases, this value will equal one of the previous scenarios, it could also be a scenario like Site Failure + 1 server failure; the worst failure mode is what is used to calculate memory and CPU requirements) Here’s an example: In the above scenario, the worst failure mode is a site failure + 1 additional server failure (since this is a 4 database copy architecture). ReplayLagManager Support ReplayLagManager is a new feature in Exchange Server 2013 that automatically plays down the lagged database copy when availability is compromised. While it is disabled by default, we recommend it be enabled as part of the Preferred Architecture. Prior to version 7.5, the calculator only supported ReplayLagManagerin the scripts created via the Distribution tab (the Role Requirements and Activation Scenarios tabs did not support it). As a result, the calculator did not factor the lagged database copy as a viable activation target for the worst failure mode. Naturally, this is an issue because sizing is based on the number of active copies and the more copies activated on a server, the greater the impact to CPU and memory requirements. In a 4-copy 2+2 site resilient design, with the fourth copy being lagged, what this meant in terms of failure modes, is that the calculator sized the environment based on what it considered the worst case failure mode – Site Failure (2 HA copies lost, only a single HA copy remaining). Using the CPU table above as an example, calculator versions prior to 7.5 would base the design requirements on 18 active database copies (site failure) instead of 22 active database copies (3 copies lost, lagged copy played down and being utilized as the remaining active). ReplayLagManageris only supported (from the calculator perspective) when the design leverages: Multiple Databases / Volume 3+ HA copies MaximumPreferredActiveDatabases Support Exchange 2010 introduced the MaximumActiveDatabasesparameter which defines the maximum number of databases that are allowed to be activated on a server by BCS. It is this value that is used in sizing a Mailbox server (and is defined the worst failure mode in the calculator). Exchange 2013 introduced an additional parameter, MaximumPreferredActiveDatabases. This parameter specifies a preferred maximum number of databases that the Mailbox server should have. The value of MaximumPreferredActiveDatabasesis only honored during best copy and server selection (phases 1 through 4), database and server switchovers, and when rebalancing the DAG. With version 7.5 or later, the calculator recommends setting MaximumPreferredActiveDatabases when there are four or more total database copies. Also, the Export DAG List form exposes the MaximumPreferredActiveDatabasessetting and createdag.ps1 sets the value for the parameter. Restore-DatabaseAvailabilityGroup Scenario Support In prior releases, the Distribution tab only supported the concept of Fail WAN, which allowed you to simulate the effects of a WAN failure and model the surviving datacenter’s reaction depending on the location of the Witness server. However, Fail WAN did not attempt to shrink the quorum, so if you attempted to fail an additional server you would end up in this condition: With this version 7.5 and later, the calculator adds a new mode: Fail Site. When Fail Site is used, the datacenter switchover steps are performed (and thus the quorum is shrunk, alternate witness is utilized, if required, etc.) thereby allowing you to fail additional servers. This allows you to simulate the worst failure mode that is identified in the Role Requirements and Activation Scenarios tabs. Note: In order to recover from the Fail Site mode, you must click the Refresh Database Layout button. Sizing Guidance Recommendations As Jeff recently discussed in Ask The Perf Guy: How Big Is Too Big?, we are now providing explicit recommendations on the maximum number of processor cores and memory that should be deployed in each Exchange 2013 server. The calculator will now warn you if you attempt a design that exceeds these recommendations. As always, we welcome your feedback. Ross Smith IV Principal Program Manager Office 365 Customer Experience
Ross Smith IV
Feb 09, 2021 Place Exchange Team Blog
23KViews
0likes
6Comments
Released: Exchange 2013 Server Role Requirements Calculator
To download the calculator, please see the attachment on this post (also note that the calculator can be used for E2016 deployments as per this). It’s been a long road, but the initial release of the Exchange Server Role Requirements Calculator is here. No, that isn’t a mistake, the calculator has been rebranded. Yes, this is no longer a Mailbox server role calculator; this calculator includes recommendations on sizing Client Access servers too! Originally, marketing wanted to brand it as the Microsoft Exchange Server 2013 Client Access and Mailbox Server Roles Theoretical Capacity Planning Calculator, On-Premises Edition. Wow, that’s a mouthful and reminds me of this branding parody. Thankfully, I vetoed that name (you’re welcome!). The calculator supports the architectural changes made possible with Exchange 2013 and later: Client Access Servers Like with Exchange 2010, the recommendation in Exchange 2013 is to deploy multi-role servers. There are very few reasons you would need to deploy dedicated Client Access servers (CAS); CPU constraints, use of Windows Network Load Balancing in small deployments (even with our architectural changes in client connectivity, we still do not recommend Windows NLB for any large deployments) and certificate management are a few examples that may justify dedicated CAS. When deploying multi-role servers, the calculator will take into account the impact that the CAS role has and make recommendations for sizing the entire server’s memory and CPU. So when you see the CPU utilization value, this will include the impact both roles have! When deploying dedicated server roles, the calculator will recommend the minimum number of Client Access processor cores and memory per server, as well as, the minimum number of CAS you should deploy in each datacenter. Transport Now that the Mailbox server role includes additional components like transport, it only makes sense to include transport sizing in the calculator. This release does just that and will factor in message queue expiration and Safety Net hold time when calculating the database size. The calculator even makes a recommendation on where to deploy the mail.que database, either the system disk, or on a dedicated disk! Multiple Databases / JBOD Volume Support Exchange 2010 introduced the concept of 1 database per JBOD volume when deploying multiple database copies. However, this architecture did not ensure that the drive was utilized effectively across all three dimensions – throughput, IO, and capacity. Typically, the system was balanced from an IO and capacity perspective, but throughput was where we saw an imbalance, because during reseeds only a portion of the target disk’s total capable throughput was utilized. In addition, capacity on the 7.2K disks continue to increase with 4TB disks now available, thus impacting our ability to remain balanced along that dimension. In addition, Exchange 2013 includes a 33% reduction in IO when compared to Exchange 2010. Naturally, the concept of 1 database / JBOD volume needed to evolve. As a result, Exchange 2013 made several architectural changes in the store process, ESE, and HA architecture to support multiple databases per JBOD volume. If you would like more information, please see Scott’s excellent TechEd session in a few weeks on Exchange 2013 High Availability and Site Resilience or the High Availability and Site Resilience topic on TechNet. By default, the calculator will recommend multiple databases per JBOD volume. This architecture is supported for single datacenter deployments and multi-datacenter deployments when there is copy and/or server symmetry. The calculator supports highly available database copies and lagged database copies with this volume architecture type. The distribution algorithm will lay out the copies appropriately, as well as, generate the deployment scripts correctly to support AutoReseed. High Availability Architecture Improvements The calculator has been improved in several ways for high availability architectures: You can now specify the Witness Server location, either primary, secondary, or tertiary datacenter. The calculator allows you to simulate WAN failures, so that you can see how the databases are distributed during the worst failure mode. The calculator allows you to name servers and define a database prefix which are then used in the deployment scripts. The distribution algorithm supports single datacenter HA deployments, Active/Passive deployments, and Active/Active deployments. The calculator includes a PowerShell script to automate DAG creation. In the event you are deploying your high availability architecture with direct attached storage, you can now specify the maximum number of database volumes each server will support. For example, if you are deploying a server architecture that can support 24 disks, you can specify a maximum support of 20 database volumes (leaving 2 disks for system, 1 disk for Restore Volume, and 1 disks as a spare for AutoReseed). Additional Mailbox Tiers (sort of!) Over the years, a few, but vocal, members of the community have requested that I add more mailbox tiers to the calculator. As many of you know, I rarely recommend sizing multiple mailbox tiers, as that simply adds operational complexity and I am all about removing complexity in your messaging environments. While, I haven’t specifically added additional mailbox tiers, I have added the ability for you to define a percentage of the mailbox tier population that should have the IO and Megacycle Multiplication Factors applied. In a way, this allows you to define up to eight different mailbox tiers. Processors I’ve received a number of questions regarding processor sizing in the calculator. People are comparing the Exchange 2010 Mailbox Server Role Requirements Calculator output with the Exchange 2013 Server Role Requirements Calculator. As mentioned in our Exchange 2013 Performance Sizing article, the megacycle guidance in Exchange 2013 leverages a new server baseline, therefore, you cannot directly compare the output from the Exchange 2010 calculator with the Exchange 2013 calculator. Conclusion There are many other minor improvements sprinkled throughout the calculator. We hope you enjoy this initial release. All of this work wouldn’t have occurred without the efforts of Jeff Mealiffe (for without our sizing guidance there would be no calculator!), David Mosier (VBA scripting guru and the master of crafting the distribution worksheet), and Jon Gollogy (deployment scripting master). As always we welcome feedback and please report any issues you may encounter while using the calculator by emailing strgcalc AT microsoft DOT com. Ross Smith IV Principal Program Manager Exchange Customer Experience
Ross Smith IV
Sep 02, 2020 Place Exchange Team Blog
352KViews
1like
55Comments
Dude, Where's My Single Instance?
In Exchange Server 2010, there is no more single instance storage (SIS). To help understand why SIS is gone, let's review a brief history of Exchange. During the development of Exchange 4.0, we had two primary goals in mind, and SIS was borne out of these goals: Ensure that messages were delivered as fast and as efficient as possible. Reduce the amount of disk space required to store messages, as disk capacity was premium. Exchange 4.0 (and, to a certain extent, Exchange 5.0 and Exchange 5.5) was really designed as a departmental solution. Back then, users were typically placed on an Exchange server based on their organization structure (often, the entire company was on the same server). Since there was only one mailbox database, we maximized our use of SIS for both message delivery (only store the body and attachments once) and space efficiency. The only time we created another copy within the store was when the user modified their individual instance. For almost 19 years, the internal Exchange database table structure has remained relatively the same: Then came Exchange 2000. In Exchange 2000, we evolved considerably - we moved to SMTP for server-to-server connectivity, we added storage groups, and we increased the maximum number of databases per server. The result was a shift away from a departmental usage of Exchange to enterprise usage of Exchange. Moreover, the move to 20 databases reduced SIS effects on space efficiency, as the likelihood that multiple recipients were on the same database decreased. Similarly, message delivery was improved by our optimizations in transport, so transport no longer benefited as much from SIS either. With Exchange 2003, consolidation of servers took off in earnest due to features like Cached Exchange Mode. Again the move away from departmental usage continued. Many customers moved away from distributing mailboxes based on their organization structure to randomization of the user population across all databases in the organization. Once again, the space efficiency effects of SIS were further reduced. In Exchange 2007, we increased the number of databases you could deploy, which again reduced the space efficiency of SIS. We further optimized transport delivery and completely removed the need for SIS from a transport perspective. Finally, we made changes to the information store that removed the ability to single instance message bodies (but allowed single instancing of attachments). The result was that SIS no longer provided any real space savings - typically only about 0-20%. One of our main goals for Exchange 2010 was to provide very large mailboxes at a low cost. Disk capacity is no longer a premium; disk space is very inexpensive and IT shops can take advantage of larger, cheaper disks to reduce their overall cost. In order to leverage those larger capacity disks, you also need to increase mailbox sizes (and remove PSTs and leverage the personal archive and records management capabilities) so that you can ensure that you are designing your storage to be both IO efficient and capacity efficient. During the development of Exchange 2010, we realized that having a table structure optimized for SIS was holding us back from making the storage innovations that were necessary to achieve our goals. In order to improve the store and ESE, to change our IO profile (from many, small, random IOs to larger, fewer, more sequential IOs), and to resolve our inefficiencies around item count, we had to change the store schema. Specifically, we moved away from a per-database table structure to a per-mailbox table structure: This architecture, along with other changes to the ESE and store engines (lazy view updates, space hints, page size increase, b+ tree defrag, etc.), netted us not only a 70% reduction in IO over Exchange 2007, but also substantially increased our ability to store more items in critical path folders. As a result of the new architecture and the other changes to the store and ESE, we had to deal with an unintended side effect. While these changes greatly improved our IO efficiency, they made our space efficiency worse. In fact, on average they increased the size of the Exchange database by about 20% over Exchange 2007. To overcome this bloating effect, we implemented a targeted compression mechanism (using either 7-bit or XPRESS, which is the Microsoft implementation of the LZ77 algorithm) that specifically compresses message headers and bodies that are either text or HTML-based (attachments are not compressed as typically they exist in their most compressed state already). The result of this work is that we see database sizes on par with Exchange 2007. The below graph shows a comparison of database sizes for Exchange 2007 and Exchange 2010 with different types of message data: As you can see, Exchange 2007 databases that contained 100% Rich Text Format (RTF) content was our baseline goal when implementing database compression in Exchange 2010. What we found is that with a mix of messaging data (77% HTML, 15% RTF, 8% Text, with an average message size of 50KB) that our compression algorithms are on par with Exchange 2007 database sizes. In other words, we mitigated most of the bloat caused by the lack of SIS. Is compression the answer to replacing single instancing all together? The answer to that question is that it really does depend. There are certain scenarios where SIS may be viable: Environments that only send Rich-Text Format messages. The compression algorithms in Exchange 2010 do not compress RTF message blobs because they already exist in their most compressible form. Sending large attachments to many users. For example, sending a large (30 MB+) attachment to 20 users. Even if there were only 5 recipients out of the 20 on the same database, in Exchange 2003 that meant the 30MB attachment was stored once instead of 5 times on that database. In Exchange 2010, that attachment is stored 5 times (150 MB for that database) and isn't compressed. But depending on your storage architecture, the capacity to handle this should be there. Also, your email retention requirements will help here, by forcing the removal of the data after a certain period of time. Business or organizational archives that are used to maintain immutable copies of messaging data benefit from single instancing because the system only has to keep one copy of the data, which is useful when you need to maintain that data indefinitely for compliance purposes. If you go back through our guidance over the past 10 years, you will never find a single reference to using SIS around capacity planning. We might mention it has an impact in terms of the database size, but that's it. All of our guidance has always dictated designing the storage without SIS in mind. And for those that are thinking about thin provisioning, SIS isn't a reason to do thin provisioning, nor is SIS a means to calculate your space requirements. Thin provisioning requires an operational maturity that can react quickly to changes in the messaging environment , as well as, a deep understanding of the how the user population behaves and grows over time to sufficiently allocate the right amount of storage upfront. In summary, Exchange 2010 changes the messaging landscape. The architectural changes we have implemented enable the commoditization of email - providing very large mailboxes at a low cost. Disk capacity is no longer a premium. Disk space is cheap and IT shops can take advantage of larger, cheaper disks to reduce their overall cost. With Exchange 2010 you can deploy a highly available system with a degree of storage efficiency without SIS at a fraction of the cost that was required with previous versions of Exchange. So, there you have it. SIS is gone. - Ross Smith IV
The_Exchange_Team
May 12, 2020 Place Exchange Team Blog
53KViews
1like
36Comments