In Part 1 of this series, I explained the E2K7_IndexRebuildAnalyzer.ps1 script (also please go here for Part 3). In this article, I will discuss the Search Rebuild Framework that Anatoly Girko and I developed. This framework was originally designed to provide our internal operations staff with a set of comprehensive validation steps and progress indicators they could leverage when performing rebuilds of content indexes.
Stage 1: Pre-Rebuild Statistics Gathering
In our environment whenever a decision is made to rebuild an Exchange Mailbox Database’s Content Index files the Operator begins by first calculating pre-rebuild statistics for the impacted store. These statistics are always written to CSV for documentation purposes and are eventually inserted into our historical data collection set. However as discussed, the use of the -CSVFile parameter) is optional. In situations where the -CSVFile parameter is not passed the corresponding output will be written to the shell console window. For best readability you should adjust both the Screen Buffer Size Width and Window Size Width of the console to properly accommodate for all of the various headers and metrics that will be outputted. This will allow you to read the values more easily in the console session. From there I typically “pipe” the output to Format-Table (ft) with the -AutoSize (-a) parameter.
Examples
Console Example:
.\E2K7_IndexRebuildAnalyzer.ps1 -CMS NA1-ERICNOR-1 | ft -a
Output:
CSV Example:
.\E2K7_IndexRebuildAnalyzer.ps1 -CMS NA1-ERICNOR-1 -CSVFile c:\ericnor\NA-1ERICNOR-1_PreRebuild.csv
Output:
The pre-metrics are then contrasted against the historical data collection and an average time to complete rebuild estimate is derived. We take into account the regional location of the mailbox database and corresponding end user mailboxes. Based upon geography of the store and the estimated time to complete we schedule the rebuild work for a date and time where user activity on that store is minimalized.
Stage 2: Reset Content Index for Impacted Mailbox Database(s)
The operators in our environment then proceed to reset the Content Index for the Mailbox Database utilizing either of the techniques documented in How To Rebuild the Full-Text Index Catalog.
In the examples that follow I will be resetting the Content Index files for two Mailbox Databases within my environment. These two stores also happen to have the largest Mailbox Counts, EDB File Sizes and collective Database Item Counts on the NA1-ERICNOR-1 Clustered Mailbox Server:
Reset of Content Index Files:
Stage 3: Validating Search Indexer Has Detected Full Crawl Required
After the existing Content Indexing files have been removed from the file system and the Microsoft Exchange Search Indexer service has been subsequently restarted, it is the responsibility of the MonitorAndUpdateMDBList thread to determine the current state of the Content Indexes for all Mailbox Databases on the Mailbox Server that are enabled for Content Indexing. Once the MonitorAndUpdateMDBList thread has determined the Content Index Health State for each Mailbox Database it places each Mailbox Database’s Health State value into memory. If the value of the Content Index Health State is equal to “New”, the Microsoft Search Indexer Service has determined that a complete Full Crawl is required to bring the Content Index files into a Healthy state (“Healthy” being the point where the index is being kept up to date via Store Notification). It’s at this time that the Exchange Search Indexer service initiates a Full Crawl operation against the impacted Mailbox Database.
The time that the Full Crawl operation actually begins is denoted in the Application Event log by Event ID 109.
To ensure that all Full Crawl operations have begun on a system the operator performing the work will validate the presence of Event ID 109 for each Mailbox Database whose Content Index was previously reset in Step-2.
Example
As illustrated in the previous example the Content Index files for NA1-ERICNOR-1-DB1 and NA1-ERICNOR-1-DB18 were reset utilizing the ResetSearchIndex script. To validate that the Microsoft Exchange Search Indexer service has begun Full Crawl operations, a unique Event ID 109 should be present for each Mailbox Database on the Mailbox Server. This appears to be the case:
Event Type: Information
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 109
Date: 5/10/2012
Time: 2:22:19 PM
Computer: NA1-ERICNOR-1-A
Description: Exchange Search Indexer has created a new search index and will perform a full crawl for the Mailbox Database NA1-ERICNOR-1\NA1-ERICNOR-1-SG1\NA1-ERICNOR-1-DB1 (GUID = 5a1122be-b9bb-4d5b-853a-e689b1ea1129).
Event Type: Information
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 109
Date: 5/10/2012
Time: 2:22:20 PM
Computer: NA1-ERICNOR-1-A
Description: Exchange Search Indexer has created a new search index and will perform a full crawl for the Mailbox Database NA1-ERICNOR-1\NA1-ERICNOR-1-SG18\NA1-ERICNOR-1-DB18 (GUID = 2faba54d-1699-441e-8ac8-1a136d0b7b16).
Note: As opposed to inspecting the Event Log visually another viable technique would be to simply leverage the Get-EventLog cmdlet and write the Start events to CSV. Example:
Get-EventLog "Application" | Where-Object {$_.EventID -eq 109} | Select-Object EventID,TimeGenerated,Message | Export-CSV -NoTypeInformation -Path c:\Search_StartEvents.csv
Stage 4: Validating Search Indexer Progress and Completion
Having validated that Full Crawl operations have begun (via presence of Event ID 109) the Operator monitors the overall rebuild progress via System Monitor. Specifically the following Objects and Counters would be monitored:
- MSExchange Search Indexer\Number of Databases Being Crawled
- MSExchange Search Indexer\Number of Indexed Databases Being Kept Up-to-Date by Notifications
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Full Crawl Mode Status
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Document Indexing Rate
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Number of Mailboxes Left to Crawl
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Number of Recently Moved Mailboxes Being Crawled
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Number of Outstanding Batches
- MSExchange Search Indices\<DB Instance Undergoing Crawl>\Throttling Delay Value
By reviewing the MSExchangeSearchIndexer object an Operator can easily ascertain how many Content Indexes on the Server are up to date and how many are actively being crawled. When a Mailbox Server is completely up to date the value for the “Number of Databases Being Crawled” counter will always be equal to 0, and the “Number of Indexed Databases Being Kept Up-to-Date by Notifications” counter value will be equal to the number of Mailbox Databases that are enabled for Content Indexing on the server.
In my example, I have eighteen Mailbox Databases total on my mail server and two of those databases are currently undergoing Full Crawl. Thus, the value for “Number of Databases Being Crawled” should be equal to 2, and the value of “Number of Indexed Databases Being Kept Up-to-Date by Notifications” should be equal to 16 which they are:
As individual Mailbox Databases complete Full Crawl you will notice specific changes to the various object counters in System Monitor.
At the MSExchange Search Indexer level:
- The “Number of Databases Being Crawled” counter is decremented by 1
- The “Number of Indexed Databases Being Kept Up-to-Date by Notification” is incremented by 1.
At the Mailbox Database Index/indices level:
- The value for “Full Crawl Mode Status” on the specific database has been decremented to 0 (which reflects the new value for Content Index Health state as determined by the MonitorAndUpdateMDBList monitor thread).
- The “Number of Mailboxes Left to Crawl” should reflect a value of 0.
- Although not specifically related to Full Crawl rebuild operation per se, the value for the “Number of Recently Moved Mailboxes Being Crawled” counter should also be 0 for each index. As Exchange Mailboxes are successfully moved between Exchange Mailbox Databases (provided the target is enabled for Content Indexing), the Search Notifications on the target database will be temporarily suspended. The Exchange Search Indexer Service does this so that it can perform 1-off crawls of the recently moved mailboxes to bring the Master Index completely up to date. Once 1-Off crawling has been completed, Store Notifications are resumed. Point here is that although Full Crawl has technically completed, it is still possible that 1-off crawling might be occurring if mailbox moves were occurring at the same time as Content Index rebuild. Provided this counter is equal to 0, all crawling activity occurring on the Mailbox Database will be definitively complete. Thus, it should be validated as such.
On an Exchange Server where all Content Indexes are completely up to date you would expect to see the following:
- “MSExchange Search Indexer\Number of Databases Being Crawled” is equal to 0.
- “MSExchange Search Indexer\Number of Indexed Databases Being Kept Up-to-Date by Notifications” is equal to the number of Mailbox Databases enabled for indexing on the Mailbox Server.
- “Full Crawl Mode Status” for each Exchange Mailbox Database instance on the Mailbox Server is equal to 0.
- “Number of Mailboxes Left to Crawl” for each Exchange Mailbox Database instance on the Mailbox Server is equal to 0.
- “Number of Recently Moved Mailboxes Being Crawled” for each Exchange Mailbox Database instance on the Mailbox Server is equal to 0.
In my example, all of these values are True thus I can reasonably assume that the indexes have been rebuilt successfully and that from a Content Index perspective that the server is completely Healthy:
Once all Content Indexes appear to be up to date via System Monitor the Operator performing the work would proceed to review the Application Event Log and validate the presence of Event ID 110 which is the Microsoft Exchange Search Indexer completion event for Full Crawl. Similar to Event ID 109 there will be a unique 110 event entry for each Mailbox Database that has completed Full Crawl:
Event Type: Information
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 110
Date: 5/10/2012
Time: 5:39:47 PM
Computer: NA1-ERICNOR-1-A
Description: Exchange Search Indexer completed a full crawl (indexing) of Mailbox Database NA1-ERICNOR-1\NA1-ERICNOR-1-SG1\NA1-ERICNOR-1-DB1 (GUID = 5a1122be-b9bb-4d5b-853a-e689b1ea1129).
Event Type: Information
Event Source: MSExchange Search Indexer
Event Category: General
Event ID: 110
Date: 5/10/2012
Time: 5:11:47 PM
Computer: NA1-ERICNOR-1-A
Description: Exchange Search Indexer completed a full crawl (indexing) of Mailbox Database NA1-ERICNOR-1\NA1-ERICNOR-1-SG18\NA1-ERICNOR-1-DB18 (GUID = 2faba54d-1699-441e-8ac8-1a136d0b7b16).
Note: As opposed to inspecting the Event Log visually another viable technique would be to simply leverage the Get-EventLog cmdlet and write the Completion events to CSV. Example:
Get-EventLog "Application" | Where-Object {$_.EventID -eq 110} | Select-Object EventID,TimeGenerated,Message | Export-CSV -NoTypeInformation -Path c:\Search_CompletionEvents.csv
Stage 5: Collect Post-Rebuild Metrics
At Stage-5, it is assumed that the operator has validated the completion of Full Crawl for each Mailbox Database. It’s at this point that the operator would collect Post-Rebuild metrics to determine throughput characteristics for each Mailbox Database that underwent crawl.
To collect these metrics the E2K7_IndexRebuildAnalyzer script is utilized leveraging the -PostRebuild switch.
As previously mentioned -PostRebuild parses the Application Event logs for the presence of both Event ID 109 and Event ID 110. In the case of Continuous Cluster Replication the Application Event Log for each node in the CCR Cluster is evaluated. If the script can locate these Event IDs it will calculate the total time (in various time increments) required to complete Full Crawl against each Mailbox Database. It will then return to the operator the throughput characteristics for each unique rebuild operation.
It is once again worth noting that all Mailbox Databases on the Mailbox Server will be evaluated irrespective of whether or not all Mailbox Databases had their Search Indexes rebuilt. Additionally if instantiated without the -CSVFile option the result set will be outputted to the console window. When calculating Post Rebuild statistics I would strongly encourage leveraging -CSVFile as it makes reporting, sorting, filtering, and pivoting very easy when Excel is leveraged.
Example
.\E2K7_IndexRebuildAnalyzer.ps1 -CMS NA1-ERICNOR-1 -PostRebuild -CSVFile c:\ericnor\NA1-ERICNOR-1_PostRebuild.csv
After E2K7_IndexRebuildAnalyzer has completed execution the CSV file can then be inspected. Review of the CSV shows that all Exchange Mailbox Databases on the Mailbox Server were processed. The string “NoEventsFound” is being returned for many of the Exchange Mailbox Databases because the Search Indexer Event ID pair combination cannot be located. In the example case, there are 16 Mailbox Databases reporting “NoEventsFound” and two Mailbox Databases that have valid post rebuild metrics:
This result set can be further refined by applying a simple text based filter within Excel. By applying filtering to any of the post rebuild column headers and unticking the “NoEventFound” string, only databases with valid post rebuild metrics will be displayed:
The result of applying this filter on the example result set now shows only NA1-ERICNOR-1-DB1 and NA1-ERICNOR-1-DB18 (e.g. the Mailbox Databases that had their Content Indexes reset). It is also clear that post rebuild metrics were successfully ascertained as there are valid values for the various post rebuild headers:
Post String Filter Application:
Stage 6: Test-ExchangeSearch Validation
Stage 6 in the Rebuild Framework validates that every mailbox homed on Mailbox Databases that had their Content Indexes reset can now return valid search responses. This goal can be accomplished by leveraging the core functionality contained within the Test-ExchangeSearch cmdlet. Final validation is complete only when all mailboxes return a True response to the cmdlet.
Example:
Get-Mailbox -Database NA1-ERICNOR-1\ NA1-ERICNOR-1-SG1\ NA1-ERICNOR-1-DB1 | Test-ExchangeSearch | ft -a
Get-Mailbox -Database NA1-ERICNOR-1\ NA1-ERICNOR-1-SG18\ NA1-ERICNOR-1-DB18 | Test-ExchangeSearch | ft -a
This process will take a significant amount of time to complete depending on the number of mailboxes contained within the database. It’s also worth noting that when performing bulk operations with Test-ExchangeSearch that the ResultFound and SearchTime properties will only be visible in the console window after all mailboxes have been processed (regardless of True or False results). In some cases it may also make sense to store all results in CSV for documentation purposes.
Any end user mailbox reporting False to the Test-ExchangeSearch test are treated as 1-off issues and remediated accordingly.
Stage 7: Analysis of Post Rebuild Metrics
For readability purposes I’m going to display the various metrics in table format as opposed to displaying the metrics natively in Excel columns. As previously noted there were two Exchange Mailbox Databases on the NA1-ERICNOR-1 Clustered Mailbox Server that had their Content Indexes rebuilt: NA1-ERICNOR-1-DB1 and NA1-ERICNOR-1-DB18
Mailbox Metrics Post Rebuilds:
Mailbox Database Rebuild Metrics:
The various rebuild metrics and pre-rebuild metrics are then added to the historical data collection set so that they can contribute to the historical averages for future rebuild exercises.
Conclusion
In the second post of the series, I covered the stages of the Search Rebuild Framework. My next and final post will cover the observed averages we have seen within our deployments at Microsoft.
Eric Norberg
Service Engineer
Office 365-Dedicated