In order to explain what the ‘Low Priority Wait’ feature does, let’s first go into the background and describe the problem we wanted to solve with this functionality.
SQL Server introduced Online Index maintenance and partitioned tables with SQL Server 2005. Though operations around switching partitions of a partitioned table out or into a table are considered ‘online’ operations as well, they require an exclusive Schema Modification lock (SCH_M lock). A SCH_M lock against a table is usually used by DDL operations that structurally change the table. From an architecture and design point SQL Server should use the SCH_M lock for small time periods only, especially when using online options offered by some of the DDL commands of SQL Server. Typical usage cases of a SCH_M lock on a table are:
On the other side, we do have the so called Schema Stability lock (SCH_S), which is used by T-SQL queries which read or modify data in tables. The SCH_S lock on a table is granted to a query when it accesses the query. Thereby the transaction isolation level of the query does not matter. Even queries which are accessing with uncommitted read isolation level get a SCH-S lock on the accessed table granted. As soon as the query is finished accessing the table, the SCH_S lock is released. The SCH_S lock has the purpose to make sure that while the query is reading or modifying data on the table, the structure of the table schema is not going to be changed on it. Hence DDL operations which change the structure of the table schema or perform some other changes that might require changes to essential meta data about a B-Tree and queries accessing the table/B-Tree to read or modify data, need to be synchronized. For this purpose we got the SCH_M lock and the SCH_S lock. A SCH_S lock can’t be granted on a table where another transaction holds a SCH_M lock. Same is true the other way round.
So the basic idea of how things should work when e.g. issuing a DDL statement to build a new online index is like:
So if a SCH_M lock can not be granted immediately, it will end up in the ‘lock wait list’ and with that also blocks new queries to access the table since those new queries can’t get the necessary SCH_S lock granted (requests queue up behind the SCH_M in ‘lock wait list’). With this construct, the DDL operation should succeed in foreseeable and half way predictable manner with a small and short impact on the workload. Assuming that we are not seeing queries which run like 30min or so against one table.
However there was a little undesired issue to this scenario that we especially encountered in SAP NetWeaver scenarios. It circled around the fact that more than 99% of all reads of SAP NetWeaver applications use uncommitted read transaction isolation level and the fact that in 1998 we put an optimization into SQL Server for queries in uncommitted read isolation level simply to bypass the lock wait list. A correct decision at that point in time where most of the scenarios requesting a SCH_M locks today were offline scenarios anyway, like index build and rebuild, adding a column, etc. Idea was that the uncommitted read queries will ignore locks anyway when reading. So why should these queries bother with the ‘lock wait list’ at all.
Means step #4 and #5 as presented in the first graphics did not take place for queries issued in uncommitted read transaction isolation level. Instead those queries as issued from the application side immediately got their SCH_S lock granted.
For online scenarios we introduced meanwhile, this broke one of the key elements to get the SCH_M lock a chance to get granted. Even worse what could happen now was:
The mass of reading queries, executed in uncommitted read isolation level, by SAP NetWeaver applications got their SCH_S lock granted since SQL Server ignored the fact that there was a SCH_M lock waiting to be granted in the ‘lock wait list’. Means the SCH_M lock request by the online index build (same example as above) could wait and wait to get the lock granted since SQL Server allowed steadily to get new SCH_S locks granted to queries in uncommitted isolation level. Queries by SAP NetWeaver which wanted to modify data are issued in the context of committed read transaction isolation level. As a result these queries didn’t bypass the ‘lock wait list’ and therefore got stuck behind the waiting SCH_M request. Overall result we experienced with customers was that in several cases systems got blocked since modifications against one table which couldn’t proceed occupied the majority or all workprocesses of a specific type.
Based on more and more of the situations as described above showing up, a solution was implemented for SQL Server 2012 which avoids such a scenario where uncommitted read queries bypass the ‘lock wait list’. The solution just changed the way how uncommitted read statements were dealing with SCH_M locks waiting. It does not change the behavior of queries in any other isolation level. The solution implemented looks like:
· As long as there are no SCH_M lock requests waiting in the ‘lock wait list’, the ‘lock wait list’ will be bypassed by statements issued in uncommitted read transaction isolation level
· If there is a SCH_M lock request in the ‘lock wait list’, a query in uncommitted read transaction isolation level will not bypass the ‘lock wait list’, but the SCH_S lock request will go into the ‘lock wait list’. In order behind the SCH_M lock waiting in the same list. As a result the grant of the SCH_S request for such a query is dependent on the grant and release of the SCH_M lock request entering the ‘lock wait list’ earlier.
Looked good and acceptable in a first instance until the first SAP customer scenario with this new solution created a problem. The scenario looked like:
The solution specifically for SAP NetWeaver was to switch SQL Server 2012 back to the old behavior to avoid such scenario. This is done with trace flag 617, which is added by SWPM in post-upgrade phase of an upgrade to SQL Server 2012.
The idea behind the SQL Server 2014 solution is sacrificing the deterministic and predictability of a DDL command over having no impact on the workload. Means in opposite to the scenario as in SQL Server 2012 or earlier for committed read queries, we would not hold up queries executing against a table, despite the fact that there is a SCH_M lock request in the ‘lock wait list’ for the same table. Idea is that the SCH_M lock would sit in the ‘lock wait list’ in a lower priority wait and will be granted if for a split second where no query is working on the table and no other locks still granted on content of the table. Assumption is that if one waits long enough, there always will be such a split second w/o anything happening on a table.
If one wants to use such a lower priority lock, one needs to express this by the syntax of the ‘alter index’ command and the ‘alter table’ command. The low lock priority is working for online index build/rebuild scenario and partition switch executions. However it does not work for dbcc shrinkfile unfortunately.
The low priority wait definition has three elements to it:
Or as it can be found so nicely explained in this article so far:
http://msdn.microsoft.com/en-us/library/ms188388(v=sql.120).aspx
Means a command to build an index online with low priority wait could look like:
ALTER index [FALGFLEXT~0] REBUILD WITH (ONLINE = ON (WAIT_AT_LOW_PRIORITY ( MAX_DURATION = 60 MINUTES, ABORT_AFTER_WAIT = SELF) ))
In this case the command would wait for 60 minutes in low wait priority to catch a situation where no locks are held on the table which could prevent it getting the SCH_M lock request granted. If the command could not get executed within those 60 minutes, it will expire and abort itself.
This example demonstrates how low priority waits can be used to circumvent some problems with high DML workload on tables and the need for performing some DDL operations.
So far the new functionality did not get introduced into the SAP Data Dictionary Coding or SAP BW coding. Currently we also do not have plans to implement it since we don’t see the real need for using it within NetWeaver or BW yet. Especially in SAP BW, the partition switch so far went without impacts. Also creating indexes out of SAP DDIC so far hardly caused interruptions.
But it certainly is a feature for a lot of customers can use for administrative purposes.
At this point, we will interrupt our new features of SQL Server 2014 and will continue with another blog that from a context fully fits into what is described here. So in the next blog you can see a scenario where the SCH_M lock request can cause a problem and the way around it.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.