I and Microsoft will
opt for restore of a backup when corruption of the database is encountered. This is the only way to safely restore the database to a known state. All other actions, such as dbcc repair, can remove data from the database and break referential integrity or other business logic. Whenever you embark on a data recovery effort outside of backup you open up the possibility of further data loss and future application problems due to missing data.
I and Microsoft always recommend running with checksum enabled to help identify any data damage. This has provide very valuable to a large set of customers.
I ran into an issue yesterday that opened up an interesting twist to the
setting. The checksum failure appeared to be localized to the actual contents of a character column and not the structure of the page or rows on the page. So the damage was only to an address such as 'Street' was now 'StrAet'. However, as soon as a page is identified as having a checksum failure the entire page is deemed unstable. We needed to recover rows from the page if possible.
When we find row level damage we can often use a query to force an index retrieval of the rows around the damage. Because the entire data page was flagged by the checksum failure we can't access any rows on the page.
The checksum validation is controlled at the database level (PAGEVERIFY) if you set the PAGEVERIFY to NONE a checksums are ignored allowing access to the page.
CAUTION: This could lead to other unwanted behaviors so only attempt such an action under a controlled environment.
Here is a example of how disabling CHECKSUM for a small window may allow data to be retrieved.
I built a sample database
Modified it with a hex editor to simulate damage
Altered the PAGEVERIFY to NONE to avoid the issues.
drop database dbTest
create database dbTest
ALTER DATABASE dbTest set PAGE_VERIFY CHECKSUM
create table tblTest(iID int identity(1,1), strData nchar(50))
insert into tblTest values('First Row')
insert into tblTest values('Second Row
insert into tblTest values('Third Row')
-- Find the first page
select * from sys.system_internals_allocation_units
where container_id = (select hobt_id from sys.partitions
where object_id = object_id('tblTest'))
ALTER DATABASE dbTest set OFFLINE
-- Introduce checksum failure with HEX Editor
ALTER DATABASE dbTest set ONLINE
dbcc checkdb(dbTest) -- This one fails with checksum
ALTER DATABASE dbTest set PAGE_VERIFY NONE
dbcc checkdb(dbTest) -- This one succeeds (data is still damaged but you can retrieve rows with a select)
Notice that I changed the ‘to’ in the second row as highlighted with my hex editor.
Damaged with PAGEVERIFY CHECKSUM enabled as recommended. Notice that all rows in the table are on the damaged page and are considered invalid/lost.
DBCC results for 'tblTest'.
Msg 8928, Level 16, State 1, Line 1
Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594043301888 (type In-row data): Page (1:73) could not be processed. See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 2073058421, index ID 0, partition ID 72057594038321152, alloc unit ID 72057594043301888 (type In-row data), page (1:73). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.
There are 0 rows in 0 pages for object "tblTest".
Simply using ALTER DATABASE to PAGEVERIFY NONE the checksum issue is ignored and we check the rows on the page. In this case the data was damaged but the column was intact so no further error. If the length of the data would have been damaged DBCC would have reported a row level error.