When you're migrating on-premises mailboxes to Office365, there are a lot of factors that can impact overall mailbox migration speed and performance. This post will help you investigate and correct the possible causes by using the AnalyzeMoveRequestStats.ps1 script to analyze the performance of a batch of move requests to identify reasons for slower performance.
The AnalyzeMoveRequestStatsscript provides important performance statistics from a given set of move request statistics. It also generates two files - one for the failure list, and one for individual move statistics.
Step 1: Download the script (see attachment on this post) and import using the following syntax
> . .\AnalyzeMoveRequestStats.ps1
Step 2: Select the move requests that you want to analyze
In this example, we’re retrieving all currently executing requests that are not in a queued state.
> $moves = Get-MoveRequest | ?{$_.Status -ne 'queued'}
Step 3: Get the move reports for each of the move requests you want to analyze
Note: This make take a few minutes, especially if you’re analyzing a large number of moves
> $stats = $moves | Get-MoveRequestStatistics –IncludeReport
Step 4: Run the ProcessStats function to generate the statistics
> ProcessStats -stats $stats -name ProcessedStats1
The output should look similar to the following:
StartTime: 2/18/2014 19:57
EndTime: 3/3/2014 17:15
MigrationDuration: 12 day(s) 19:10:55
MailboxCount: 50
TotalGBTransferred: 2.42
PercentComplete: 95
MaxPerMoveTransferRateGBPerHour: 1.11
MinPerMoveTransferRateGBPerHour: 0.43
AvgPerMoveTransferRateGBPerHour: 0.66
MoveEfficiencyPercent: 86.36
AverageSourceLatency: 123.55
AverageDestinationLatency:
IdleDuration: 1.16%
SourceSideDuration: 78.93%
DestinationSideDuration: 19.30%
WordBreakingDuration: 9.63%
TransientFailureDurations: 0.00%
OverallStallDurations: 4.55%
ContentIndexingStalls: 1.23%
HighAvailabilityStalls: 0.00%
TargetCPUStalls: 3.32%
SourceCPUStalls: 0.00%
MailboxLockedStall: 0.00%
ProxyUnknownStall: 0.00%
How to read the results
The first step in understanding the results are to understand the definitions of the report items:
Report Item |
Definition |
StartTime |
Timestamp of the first injected request |
EndTime |
Timestamp of the last completed request. If there isn’t a completed/autosuspended move, this is set to the current time. |
MigrationDuration |
EndTime – StartTime |
MailboxCount |
# of mailboxes |
TotalGBTransferred |
Total amount of data transferred |
PercentComplete |
Completion percentage |
MaxPerMoveTransferRateGBPerHour |
Maximum per-mailbox transfer rate |
MinPerMoveTransferRateGBPerHour |
Minimum per-mailbox transfer rate |
AvgPerMoveTransferRateGBPerHour |
Average per-mailbox transfer rate. For onboarding to Office 365, any value greater than 0.5 GB/h represents a healthy move rate. The normal range is 0.3 - 1 GB/h. |
MoveEfficiencyPercent |
Transfer size is always greater than the source mailbox size due to transient failures and other factors. This percentage shows how close these numbers are and is calculated as SourceMailboxSize/TotalBytesTransferred. A healthy range is 75-100%. |
AverageSourceLatency |
This is the duration calculated by making no-op WCF web service calls to the source MRSProxy service. It's not the same as network ping and 100ms is desirable for better throughput. |
AverageDestinationLatency |
Similar to AverageSourceLatency, but applies to off-boarding from Office 365. This value isn’t applicable in this example scenario. |
IdleDuration |
Amount of time that the MRSProxy service request waits in the MRSProxy service's in-memory queue due to limited resource availability. |
SourceSideDuration |
Amount of time spent in the source side which is the on-premises MRSProxy service for onboarding and Office 365 MRSProxy service for off-boarding. The typical range for this value is 60-80% for onboarding. A higher average latency and transient failure rate will increase this rate. A healthy range is 60-80%. |
DestinationSideDuration |
Amount of time spent in the destination side which is Office 365 MRSProxy service for onboarding and on-premises MRSProxy service for off-boarding. The typical range for this value is 20-40% for onboarding. Target stalls such as CPU, ContentIndexing, and HighAvailability will increase this rate. A healthy range is 20-40%. |
WordBreakingDuration |
Amount of time spent in separating words for content indexing. A healthy range is 0-15%. |
TransientFailureDurations |
Amount of time spent in transient failures, such as intermittent connectivity issues between MRS and the MRSProxy services. A healthy range is 0-5%. |
OverallStallDurations |
Amount of time spent while waiting for the system resources to be available such as CPU, CA (ContentIndexing), HA (HighAvailability). A healthy range is 0-15%. |
ContentIndexingStalls |
Amount of time spent while waiting for Content Indexing to catch up. |
HighAvailabilityStalls |
Amount of time spent while waiting for High Availability (replication of the data to passive databases) to catch up. |
TargetCPUStalls |
Amount of time spent while waiting for availability of the CPU resource on the destination side. |
SourceCPUStalls |
Amount of time spent while waiting for availability of the CPU resource on the source side. |
MailboxLockedStall |
Amount of time spent while waiting for mailboxes to be unlocked. In some cases, such as connectivity issues, the source mailbox can be locked for some time. |
ProxyUnknownStall |
Amount of time spent while waiting for availability of remote on-prem resources such as CPU. The resource can be identified by looking at the generated failures log file. |
Next, you need to identify which side of the migration components is slower by looking at the SourceSideDuration and DestinationSideDurationvalues.
Note: The SourceSideDuration value + DestinationSideDuration value is usually, but not always, equal to 100%.
If you see that the SourceSideDuration value is greater than the normal range of 60-80%, this means the source side is the bottleneck. If the DestinationSideDurationvalue is greater than the normal 20-40%, this means the destination side of the migration is the bottleneck
Causes of source side slowness in onboarding scenarios
There are several possible reasons the source side of the migration in an onboarding scenario may be causing slower than normal performance.
High transient failures
Most common reason for transient failures is the connectivity issue to the on-premises MRSProxy web service on your Mailbox servers. Check the TransientFailureDurations and MailboxLockedStallvalues in the output and also check the failure log generated by this script to verify any failures. The source mailbox may also get locked when a transient failure occurs and this will lower migration performance.
Misconfigured network load balancers
Another common reason for connectivity issues are misconfigured load balancers. If you're load balancing your servers, the load balancer needs to be configured so that all calls for a migration specific request is directed to the same server hosting the MRSProxy service instances.
Some load balancers use the ExchangeCookieto associate all the migration requests to the same Mailbox server where the MRSProxy service is hosted.
If your load balancers are not configured correctly, migration calls may be directed to the “wrong” MRSProxy service instance and will fail. This causes the source mailbox to be locked for some time and lowers migration performance.
High network latency
The Office 365 MRSProxy service makes periodic dummy web service calls to the on-premises MRSProxy service and collects statistics from these calls. The AverageSourceLatencyvalue represents the average duration of these calls. If you see high values for this parameter (greater than 100ms), it can be caused by:
Network latency between the Office 365 and on-premises MRSProxy services is high
In this case, you can try to reduce the network latency by
- Migrate mailboxes from servers closer to Office 365 datacenters. This is usually not feasible, but can be preferred if the migration project is time critical.
- Delete empty mailbox folders or consolidate mailbox folders. High network latency affects the move rate more if there are too many folders.
- Increase the export buffer size. This reduces the number of migration calls, especially for larger mailboxes and reduces the time spent in network latency. You can increase the export buffer size by adding the ExportBufferSizeOverrideKB parameter in the MSExchangeMailboxReplication.exe.config file
Example: ExportBufferSizeOverrideKB="7500"
Important: You need to have Exchange 2013 SP1 installed on your Client Access server to increase the export buffer size. This will also disable cross forest downgrade moves between Exchange 2013 and Exchange 2010 servers.
Source servers are too busy and not very responsive to the web service calls
In this care you can try to release some of the system resources (CPU, Memory, Disk IO etc.) on the Mailbox and Client Access servers.
Scale issues
The resource consumption on the on-premises Mailbox or Client Access servers may be high if you're not load balancing the migration requests or you're running other services on the same servers. You can try distributing the source mailboxes to multiple Mailbox servers and moving mailboxes to different databases located on separate physical hard drives.
Causes of destination side slowness in onboarding scenarios
There are several possible reasons the destination side of the migration in an onboarding scenario may be causing slower than normal performance. Since the destination side of the migration is Office 365, there are limited options for you to try to resolve bottlenecks. The best solution is to remove the migration requests and re-insert them so that migration requests are assigned to less busy Office 365 servers.
Office 365 system resources: This is likely due to insufficient system resources in Office 365. Office 365 destination servers may be too busy with handling other requests associated with normal support of services for your organization.
Stalls due to word breaking: Any mailbox content that is migrated to Office365 is separated into individual words so that it can be indexed later on. This is performed by the Office 365 search service and coordinated by the MRS service. Check the WordBreakingDuration values to see how much time is spent in this process. If it's more than 15%, it usually indicates that the content indexing service running on the Office 365 target server is busy.
Stall due to Content Indexing: Content indexing service on the Office 365 servers is too busy.
Stall due to High Availability: High availability service that is responsible to copy the data to multiple Office 365 servers is too busy.
Stall due to CPU: The Office 365 server's CPU consumption is too high.
Karahan Celikel and the Migration Team
You Had Me at EHLO.