I started looking at the plan differences between the update and the same select criteria. I didn't see anything that popped out as significantly different for obtaining data on the select side of the input and the update side of the plan looked reasonable.
I then looked at the plan differences between the 2GB server memory setting and a larger memory setting - No differences, the same plan was being used. - Hmmm - interesting.
I looked a the statistics time and I/O outputs to see if something significant could be uncovered. The I/O was about the same but there was a significant difference in the CPU usage between the update and the select.
Back to the plan for the update. I was looking to see if it was possible the update portion of the plan could drive CPU if we had fetched the pages into memory. Clearly I can come up with an update that touches a small number of pages, gets them locked into buffer pool memory and then updates the same rows many times; driving CPU and not physical I/O. - This was not the case for the scenario presented. I had to update millions of rows to reproduce the problem.
Started tracing the activity to see what other things were going on. What I saw was lock escalation taking place when the query ran faster, under the 2GB SQL Server max server memory setting.
Now I had a pretty good idea that locking played a role in all of this. I then enabled trace flag ( -T1211 - use with caution ) to disable lock escalation and I could cause the same issue on the SQL Server's max server memory setting to 2GB installation.
Using debugging tools I captured the execution of the scenario and looked at those code paths using the most CPU resources. What I found was a code path related to creation and destruction of a lock class (but unfortunately there are no XEvents or Trace events in this area.)
Note: I did file work items with the development team to expose this activity.
At a high level , a lock class is a container of locks for a given part of a plan. It is often used to protect a plan from changing data as the data passes from one portion of the plan to the next. (Think hash, sort, spool … type of operations).
Let's discuss the following example:
update t set t.strData = CAST(t.iID as varchar(10)) from tblTest t join tblTest t2 on t2.iID = t.iID
||--Table Update(OBJECT:([tempdb].[dbo].[tblTest] AS [t]), SET:([tempdb].[dbo].[tblTest].[strData] as [t].[strData] = [Expr1006]))|
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1006]=CONVERT(varchar(10),[tempdb].[dbo].[tblTest].[iID] as [t].[iID],0)))
|--Nested Loops(Left Semi Join, WHERE:([tempdb].[dbo].[tblTest].[iID] as [t].[iID]=[tempdb].[dbo].[tblTest].[iID] as [t2].[iID]))
|--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t]))
|--Table Scan(OBJECT:([tempdb].[dbo].[tblTest] AS [t2]))
Clearly I can see that lock escalation will reduce the number of locks and reduce this work. One workaround but probably not what most folks want to do in a production environment.
My first thought was how many locks is the update requiring that would cause us to do a lot of work during release? In studying the lock acquired and released events for the update statement I found that it was only a handful. So again, why the large CPU burn? - The problem was the update was inside a cursor loop that executed millions of times and all of this was under a single transaction.
Another workaround I found was to use smaller transactions, but more of them. I found this odd as I am still doing the same amount of work just in smaller chunks. Smaller chunks would help avoid lock escalations and I thought it would make it worse based on previous testing.
What is happening is the lock class has a bug. It is not properly releasing just the locks it acquired. It is running the entire lock list for the transaction. Because the locks acquired before the update don't have a lock class association there is nothing to do for those locks. The locks are properly maintained, the SQL Server is just running the entire lock list instead of the portion associated with the lock class.
In this reproduction case there are 400,000 locks acquired before the cursor execution driving the updates. This means that each update will run ALL 400,000 lock structures and find nothing to do. As the next update occurs and SQL Server does it all over again, burning the unnecessary CPU. Based on this behavior; when I dropped the size of the transaction I reduced the number of locks and as such the number of CPU cycles.
Note: Bug filed with the SQL Server development team. Internal testing, with the bug fix, shows the query taking 140 minutes consistently runs in 32 minutes without any T-SQL code changes. This is scheduled to be released in an upcoming release (~ Nov 2012).
Is This My Problem? Am I a Candidate for the Fix?
There are some things you can do to see if your statements are encountering this problem.
The easiest way to test is to ' set transaction isolation level repeatable read ' at the session level and then run the set of queries in question. Repeatable read may use more lock memory but it also acts as a broad ' lock class ' for this test. I have also used snapshot isolation with near repeatable read results.
If the query has some of the before mentioned operations, runs significantly faster and uses far less CPU there is a good chance it is a candidate for the fix.
Without the fix you can use the transaction isolation levels, smaller transactions, or even locking hints to help control the behavior.
Bob Dorr - Principal SQL Server Escalation Engineer
assistance provide by Keith Elmore - - Principal SQL Server Escalation Engineer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.