Staging Folder Guidelines for DFS Replication

Published Apr 10 2019 01:09 AM 19.3K Views
Not applicable
First published on TECHNET on Mar 20, 2006
Hi, this is Shobana Balakrishnan here to talk about staging folders in DFS Replication. Staging folders are used to (a) isolate the files from the changes on the file system, and (b) amortize the cost of compression and computing RDC hashes across multiple partners. As the heuristic is set, staging is beneficial for reasonably large files greater than 64 KB. If DFS Replication didn’t stage the file downstream then (a) it would not be able to use cross-file RDC since cross-file RDC prefers files in staging whose hashes are pre-computed, and (b) DFS Replication will need to do in-memory de-compression, which does not scale with large files.

Here is some background on current staging space management. There are three values that are important to staging space management.

  1. Staging size in MB (configured per-replicated folder in AD)
  2. Staging low watermark percentage (configured per-server via WMI, applies to all replicated folders on the server)
  3. Staging high watermark percentage (configured per-server via WMI, applies to all replicated folders on the server)

DFS Replication will do roughly the following when trying to stage a file:

  1. Request a reservation for staging space for the file based on an estimate of the file size.
  2. If the currently used staging space is less than the configured staging size, the file is allowed to stage regardless of the reservation amount. This allows large files to replicate and not get stuck with the familiar “huge file” replication blocker on FRS. The reservation amount is accounted for in the used staging space.
  3. After staging completes, DFS Replication fixes up the reservation amount by using the actual used amount. Note that due to compression, we could have different file sizes.
  4. If the used staging space is higher than the high watermark, staging space cleanup is triggered. Staging space cleanup will clean up until it hits the low watermark or there are no more files that are candidates for cleanup i.e., all files in staging are actively being used.  Note that the cleanup is on a per replicated folder scope.

In summary, the configured staging size and the currently used/reserved staging space are used as a gate to allow or deny new staging requests. The watermarks are used for staging cleanup.

Note the following:

  • DFS Replication can be below the high watermark and fail to stage a large file without cleaning up staging. This might happen when there is not adequate disk space to stage the file, but the high watermark has not been reached for that replicated folder, such as when a large number of replicated folders share the same volume and each one’s staging folder has been set at the default 4 GB.
  • If disk space runs low, there is no special cleanup. If DFS Replication doesn’t hit the high watermark, no cleanup is triggered.
  • Multiple staging areas on a volume are not handled as an aggregate. All staging areas can be below the high watermark and fill up the disk yet no cleanup will be triggered. This is particularly important when you have multiple replicated folders with staging configured on the same volume
  • There is no magical formula to set an ideal staging area size, just some rules of thumb.
  • Staging is used for both inbound and outbound replication. When we deny staging requests, replication can become blocked until the used staging files are released (download / upload completes or aborts) and cleaned up.

It is not easy to define an optimal size for staging because staging is a trade-off between disk usage and performance. Generally you want enough space to hold at a minimum several of the files that will be replicated at the same time. So it you are replicating all 1-GB files, then you want staging to be at least 8-10 GB. On the other hand, if you are replicating mostly 100-KB files with an occasional 1-GB file, then a staging space of 1.5 – 2 GB would be OK for a minimum. As you give more space to staging, then RDC will be more efficient because RDC is only done on the staging files, so if the previous version of the file is still in the staging folder on the receiving member, then that member will not need to restage the file before comparing RDC hashes on the file. Also in this case when the file is restaged on the receiving server, the RDC hashes will need to be calculated again on the receiving member, which will delay replication and use CPU cycles.

There are several factors that affect the size of staging. Without going into theories, here are some rules of thumb:

  1. It is desirable to set the staging folder to be as large as possible (as available space) and comparable to the size of the replicated folder. Hence if the size of the replicated folder is 24.5 GB, then ideally a staging folder of comparable size is desirable. Note that this amortizes the cost of staging and hash calculation over all connections. It is also a best practice to locate the staging folder on a different spindle to prevent disk contention.
  2. If staging cannot be set comparable to the size of the replicated folder, then reduce the size by 20%. Depending on how well the data compresses, staging files will be 30-50% of the original file size.
  3. Note recommendations (1) and (2) are particularly important if all the data is preexisting and DFS Replication must process all content at the same time during initial replication. On the other hand, if the replicated folder is relatively empty and gradually grows over time, the recommendation is to determine the projected size of the replicated folder and size the staging appropriately.
  4. If the size of the staging folder cannot be set proportional to the size of the replicated folder, then increase the size of the staging folder to be equal to the five largest files in the replicated folder.
  5. Also monitor for staging clean up events in the DFS Replication event log (for example, event 4208 followed by 4206, which means that it was not possible to stage a file due to lack of space and no further clean up was possible), or frequent clean-up events (4202 followed by 4204). If more than a few such event-pair occurs every hour, then increase the size of staging by 50%.


Version history
Last update:
‎Apr 10 2019 01:10 AM
Updated by: