Blog Post

SQL Server Support Blog
5 MIN READ

Tracking database recovery progress using information from DMV

mssql-support's avatar
mssql-support
Icon for Microsoft rankMicrosoft
Jan 15, 2019
First published on MSDN on Dec 29, 2010
Technorati Tags: Engine , SQL Server 2008

You must be very familiar with the database recovery related messages printed to the SQL Server Error log. These come in very handy when troubleshooting issues that are related to long recovery. These messages provide information about the stage of the recovery process and approximate time for completion.

2010-12-29 12:02:10.43 spid25s     Starting up database 'testdb'.
2010-12-29 12:02:31.23 spid25s     Recovery of database 'testdb' (11) is 0% complete (approximately 1725 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.94 spid25s     Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 1 of 3. This is an informational message only. No user action is required.
2010-12-29 12:03:08.95 spid25s     Recovery of database 'testdb' (11) is 1% complete (approximately 1887 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s     Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 2 of 3. This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s     580056 transactions rolled forward in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:04:57.97 spid25s     Recovery of database 'testdb' (11) is 43% complete (approximately 192 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:32.05 spid25s     Recovery of database 'testdb' (11) is 99% complete (approximately 2 seconds remain). Phase 3 of 3. This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s     1 transactions rolled back in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.09 spid25s     Recovery is writing a checkpoint in database 'testdb' (11). This is an informational message only. No user action is required.
2010-12-29 12:06:35.44 spid25s     Recovery completed for database testdb (database ID 11) in 244 second(s) (analysis 37849 ms, redo 109038 ms, undo 97146 ms.) This is an informational message only. No user action is required.

Starting with SQL Server 2008, you do not need to repeatedly open/refresh the SQL Server error log or execute the stored procedure sp_readerrorlog to get up to date information about the progress of the database recovery. Most of the information is readily available in the dynamic management views [DMV]. The 2 DMV’s that offer insights into the progress of database recovery are: sys.dm_exec_requests and sys.dm_tran_database_transactions. The information presented in these DMV’s vary depending upon the situation: recovery of databases during server startup, recovery of database after a attach operation, recovery of database after a restore operation.

Here is a view of sys.dm_exec_requests showing recovery related information: [I repeatedly queried this DMV and loaded it into a temporary table for later analysis]

The key information here is the command type [ DB STARTUP ] and the session_id that indicates this is a system task performing the startup recovery. The percent_complete shows the same value that the error log messages indicate about the progress within the stage of recovery. You can use the information from other columns [wait information] to understand if the recovery is taking much longer due to an I/O issue or other problems.

Since the redo and undo portions can actually happen on different tasks in fast recovery scenarios in Enterprise Edition, you could actually see 2 different session_id’s for the same database recovery process.

In the case of attaching and restoring the database, this information will be reflected under the same session_id as the user session which executes those commands.

The DMV sys.dm_tran_database_transactions exposes information that can be useful to track the progress of the undo stage of the recovery. Here is a snapshot that corresponds to the above mentioned progress information from sys.dm_exec_requests.

The key information here is how the database_transaction_log_bytes_reserved keeps coming down as undo progresses. Also in the case where there are several transactions to undo, you will be able to see their progress using the database_transaction_next_undo_lsn.

Next time you encounter a really long recovery problem, try using some of these information points to understand if the recovery is progressing very slow or there is a lot of work to be done or if it completely stuck.

Suresh B. Kandoth

SQL Server Escalation Services

Other references:

http://blogs.msdn.com/b/psssql/archive/2008/01/23/how-it-works-what-is-restore-backup-doing.aspx

http://blogs.msdn.com/b/psssql/archive/2009/05/21/how-a-log-file-structure-can-affect-database-recovery-time.aspx

Here are some sample scripts I used to gather and report out the information I showed earlier:

-- create the tables to store the information you collect about recovery

USE <db_name>

GO

DROP TABLE [dbo].[tbl_recovery_tracking]

GO

DROP TABLE [dbo].[tbl_dm_tran_database_transactions]

GO

CREATE TABLE [dbo].[tbl_recovery_tracking](

[runtime] [datetime] NOT NULL,

[command] [nvarchar](256) NOT NULL,

[session_id] [smallint] NOT NULL,

[database_id] [smallint] NOT NULL,

[total_elapsed_time] [int] NOT NULL,

[percent_complete] [real] NOT NULL,

[estimated_completion_time] [bigint] NOT NULL,

[wait_resource] [nvarchar](256) NOT NULL,

[wait_time] [int] NOT NULL,

[wait_type] [nvarchar](60) NULL,

[blocking_session_id] [smallint] NULL,

[reads] [bigint] NOT NULL,

[writes] [bigint] NOT NULL,

[cpu_time] [int] NOT NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[tbl_dm_tran_database_transactions](

[runtime] [datetime] NOT NULL,

[transaction_id] [bigint] NOT NULL,

[database_id] [int] NOT NULL,

[database_transaction_log_record_count] [bigint] NOT NULL,

[database_transaction_log_bytes_used] [bigint] NOT NULL,

[database_transaction_log_bytes_reserved] [bigint] NOT NULL,

[database_transaction_next_undo_lsn] [numeric](25, 0) NULL

) ON [PRIMARY]

GO

-- collect the information in a loop

WHILE 1 = 1

BEGIN

INSERT INTO [dbo].[tbl_recovery_tracking]

SELECT GETDATE() as runtime, command,

session_id, database_id, total_elapsed_time,

percent_complete, estimated_completion_time,

wait_resource, wait_time, wait_type, blocking_session_id,

reads, writes, cpu_time

FROM sys.dm_exec_requests

WHERE command = 'DB STARTUP' -- may need to change this if troubleshooting recovery as part of attach database or restore

INSERT INTO tbl_dm_tran_database_transactions

SELECT GETDATE() as runtime,

transaction_id, database_id,

database_transaction_log_record_count, database_transaction_log_bytes_used,

database_transaction_log_bytes_reserved, database_transaction_next_undo_lsn

FROM sys.dm_tran_database_transactions

WAITFOR DELAY '00:00:01'            -- change this capture interval

END

GO

-- after you collect information for some time, you can analyze the information to understand the progress of recovery

SELECT runtime,   command,

session_id, database_id, total_elapsed_time,

percent_complete, estimated_completion_time,

wait_resource, wait_time, wait_type, blocking_session_id,

reads, writes, cpu_time

FROM [dbo].[tbl_recovery_tracking]

WHERE session_id = 25         -- change this

ORDER BY runtime

GO

SELECT

runtime, transaction_id, database_id,

database_transaction_log_record_count,

database_transaction_log_bytes_used, database_transaction_log_bytes_reserved,

database_transaction_next_undo_lsn

FROM tbl_dm_tran_database_transactions

WHERE database_id = 11 and transaction_id = 1452239         -- change this

ORDER BY runtime

GO

Updated Jan 15, 2019
Version 2.0
No CommentsBe the first to comment