As you can guess from the title, I'm planning a long series over the summer to go into the guts of how CHECKDB works (both the consistency checks part and the repair part). And as you can guess from 'CHECKDB', I'm already bored with putting DBCC in front of it all the time and changing the font to Courier New to make it stand out. I don't do that in real life so why on the blog?
I have no idea how long it'll be - easily more than 20, probably less than 50, but there's a ton of info in my head that's desperate to get out (ever seen the scene in the movie Scanners where that guy's head explodes? Well, its not quite that bad but remembering that scene was fun). As part of this I'll need to go into some of the low-level structural details of the database, which will hopefully be interesting too. (How do I know all this stuff? Read my bio). If there's something I need to explain, post a comment with a question in and I'll do a post to answer it. I may even do a post on how to use DBCC PAGE...
So here's the problem statement: CHECKDB needs a consistent view of the database.
Why? Well, usually its running on a live database with all kinds of stuff going on. It needs to read and analyze the whole database but it can't do it instantaneously (this isn't Star Trek) so it has to take steps to ensure that what it reads is transactionally consistent.
Here's an example. Consider a transaction to insert a record into a table that is a heap and has a non-clustered index, with a concurrent CHECKDB that doesn't enforce a consistent view.
The easy way to get the consistent state is through locking, which is what SQL Server 7.0 did. You can still do that in SQL Server 2000 and 2005 using the TABLOCK option. Another way to do it is to put the database into single-user or read-only mode.
However, excessive locking is a drag and taking the database essentially offline tends to irritate users so with SQL Server 2000 we came up with a neat way to get the consistent view and be able to run CHECKDB online - log analysis. In a nutshell, after we've read through all the database, we read the transaction log to make sure we didn't miss anything. Sounds simple, right? Dream on. Here's how it works:
Back in late 2000, it became apparent that with all the new features we were planning for 'Yukon' (we had no idea it would be called SQL Server 2005 back then), including some changes to the transaction log to allow for fast recovery and deferred transactions and stuff like versioning and online index build, the transaction log analysis was a non-starter. While it had given us the holy-grail of online consistency checks, with all the added complications of Yukon it would become impossible to maintain and get right.
But what to use instead? Who would come to my rescue? Turns out that database snapshots would be my savior. (That is their eventual name. I preferred their first name COW databases - Copy-On-Write databases - and my TechEd slide deck has a nice animated cow in homage). The in-depth details of database snapshots are beyond the scope of this post. To put it simply:
By moving to database snapshots we changed to using mainline server code to get our transactionally consistent view. This vastly reduced the complexity of the code and meant that someone else was responsible for finding and fixing its bugs during development :-)
So when CHECKDB starts, the first thing we do is work out whether we'd like to run online - if so we create a hidden database snapshot of the source database (i.e. CHECKDB's target database). That could cause you a problem - depending on your transaction load concurrent with the CHECKDB, the database snapshot can grow in size. As the we create a hidden one, you have no control over where we places the files - we just place them as alternate streams of the files comprising the source database. If you don't have room for this, just create your own database snapshot and check that.
Once the database snapshot is created, we're guaranteed a transactionally consistent view of the database and can merrily run our check algorithms against the database snapshot. Ah, you might say, but that means CHECKDB is checking the database as it was at some point in the past! Yes, I'd say, but that point is the start time of the CHECKDB, just as it was (essentially) with the log analysis mechanism in SQL Server 2000.
There are a few slight gotchas (all documented) with this approach:
And that's that. Now its time for breakfast - no oatmeal hopefully...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.