In Part 1 and Part 2 of this series we looked at the fundamentals of Exchange backups using VSS, and the flow of an active DAG database backup.
In Part 3 we break down how a passive DAG database copy undergoes a full backup. The Exchange Writer responsible for passive copy backups doesn't run in the Information Store Service, but rather as part of the MS Exchange Replication Service. Among other functions, this service coordinates the backup process between the passive copy node and the active copy server. Similar to the backup of an active database described in Part 2, this post describes the backup of a passive database copy of DB1, hosted on server ADA-MBX1. The active mounted database copy is on ADA-MBX2, and again, a non-persistent copy-on-writer (COW) snapshot is utilized by the backup solution:
(please click thumbnails for full size version of graphics in this post)
The first steps to back up a passive database copy are about the same as for an active one. The backup application gets the metadata for DB1 from the Exchange Writer, but again, the writer is running in the MS Exchange Replication Service. A new writer instance GUID is generated which will persist throughout the job, as with an active database backup.
Event 2021 indicates that the backup application, or VSS requestor, has engaged the Exchange Writer. It will appear numerous times throughout the backup as different components are read from metadata, such as log and database file locations.
Events 2110 and 2023 indicate that the backup application has requested a particular set of components to back up, and the backup type.
The replication service for the passive copy’s server signals the active copy server that a backup is in progress. Events 910 and 210 on the active copy server, as well as 960 on the passive copy server, signify two things: First, they establish which server backing up a passive copy of the database; Second, the STORE service on the active copy server has marked the database with “backup in progress” in memory and acknowledges that the surrogate backup will proceed. Once this occurs it is not possible to backup the database again until either the current surrogate backup completes, or the “backup in progress” status is otherwise cleared.
Events 2025 and 2027 are generated when the replication writer prevents the replication service from writing logs copied from the active copy server to the local disk. Replay of logs also stops, thereby keeping the contents of the database files unchanged. At this point writes of data for the database getting backed up are “frozen”. VSS can now create the snapshots in shadow storage for each disk specified in the metadata.
VSS creates snapshots of disks D: and E:. Once these complete it signals the Exchange Writer, which in turns allows the replication service to resume log copy and replay. Events 2029 and 2035 are generated when the “thaw” is completed and normal disk writes are allowed to continue.
Once the snapshots are created the backup application can copy blocks of data through VSS, which transfers blocks of data from shadow storage if they’ve been preserved due to a change, or from the actual disk volume if they haven’t. The replication service writer waits for the signal that the transfer of data is complete. This flow of data is represented by the purple arrows, which in this case indicates data getting copied out of the snapshots in storage, through I/O of the Exchange server, and on to the backup server.
When the files necessary for backing up DB1 are safely copied to backup media, the backup application signals VSS that the job is complete. VSS in turn signals the replication writer, and Exchange generates events 963 and 2046 on the passive copy server. The replication service then signals the Information Store service on the active copy server that the job is done, and that log truncation can proceed if all necessary conditions are met. The active copy node generates events 913 and 213 signaling that the surrogate backup is done, and that the database header will be updated with the date and time of the backup.
Events 2033 and 2037 signal the end of the backup. The active copy node flushes and rolls the current transaction log containing database header updates. That log is then shipped and made eligible for replay according to schedule so that the passive database copy is marked with the new header information at the earliest available time. Log truncation also proceeds if possible. In this case the snapshots are destroyed, and normal operations continue.
For more on the subject of this series here are some more great references: