I ran into an issue yesterday that brought up questions as to how the checkpoint's max outstanding, I/O target is determined. There is different behavior across the various versions of SQL Server and it would take an entire white paper to fully explain the checkpoint process. I am going to try to boil it down to the most relevant information.
I previously outlined some of the checkpoint behavior in the following white papers.
http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx
http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspx
Checkpoint sweeps the hashed data buffers. When it finds a dirty buffer it calls the routine (WriteMultiple). WriteMultiple does hash lookups for adjacent, dirty pages. The outstanding page count is incremented when a page write request is made. When the write completes the value is decrement.
The I/Os are written asynchronously. Checkpoint limits the number of I/Os it can post by comparing the number of outstanding requests with the target. When the target is reached checkpoint waits. Once an I/O completes Checkpoint can post another I/O.
Some high level pseudo-code helps.
Void FlushCache ()
{
bufferPos = 0;
while(bufferPos < maxBuffers)
{
if(buf[bufferPos] is dirty)
{
pagesInMotion = WriteMultiple(buf[bufferPos]);
}
SafeAdd(totalCheckpointPagesOutstanding, pagesInMotion);
while (totalCheckpointPagesOutstanding >= maxOutstandingPages)
{
WaitFor..(5 milli-seconds)
}
bufferPos++;
}
}
Void CheckpointIoCompletion (….)
{
Check success of WriteOperation
SafeDecrement(totalCheckpointPagesOutstanding);
}
Max Outstanding I/O Target
Version Max Outstanding I/Os for Checkpoint SQL Server 2000 SP3 100 SQL Server 2000 SP4 64 * Logical Schedulers SQL Server 2005 *Formula
As I stated there are other parts to FlushCache. For example SQL Server 2000 SP3 and SP4 attempt to maintain a rate that meets the configured recovery interval. This may include additional sleeps.
See: KB 906121: Checkpoint resumes the behavior that it exhibited before you installed SQL Server 2000 SP3 when you enable trace flag 828 on a computer that is running SQL Server 2000 SP4
http://support.microsoft.com/kb/906121
Generally speaking the loop will continue without additional waits. This means that the I/O load during SQL Server 2000 checkpoints is held steady. As soon as one of the I/Os complete Checkpoint can issue a new request.
The reason I specifically point this out is that I have seen systems where some of the I/Os become stalled. Lets say the first 50 I/Os are stalled and the last 50 complete. Checkpoint puts out 50 more I/O requests that can again complete. This can make the performance counters for average disk sec/transfer and disk queue length more challenging to interpret.
*Formula (SQL 2005)
Checkpoint (FlushCache) was significantly upgraded for SQL Server 2005. It can honor a target (see manual checkpoint) and various other parameters can affect Checkpoint. I am going to describe one of these behaviors.
The starting point for max outstanding I/O level is loosely calculated as:
Each time SQL Server 2005 checkpoint executes the calculated target establishes the max outstanding I/O level. Depending on the type of checkpoint this value is adjusted as the I/Os complete.
SQL Server 2005 checkpoint adds duration tracking to the I/O requests. As each I/O completes the duration is used to adjust a running I/O average. Here is some pseudo-code showing a very high level running average calculation. A running average is used to accommodate the 'immediate' state of the I/O path so Checkpoint can react quickly to the current state and not the historical states.
Void CheckpointIoCompletion(….)
{Check success of WriteOperation
duration = Current Time - I/O Start Time
runningAvg = (duration + (runningAvg * 7)) / 8;
SafeDecrement(totalCheckpointPagesOutstanding);}
Using this running average, Checkpoint adjusts the number of allowed max outstanding I/Os with a goal to keep I/O response time below 20ms.
Bob Dorr
SQL Server Senior Escalation Engineer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.