Blog Post

Core Infrastructure and Security Blog
4 MIN READ

APS Blocked Partition Switch

SQLPFE's avatar
SQLPFE
Copper Contributor
Mar 15, 2019

First published on MSDN on Jan 30, 2017

In SQL Server, when you perform a partition switch, a schema lock is acquired briefly to do the operation. This can get blocked by read operations that have a schema-s lock on the table. APS works a little differently as a lot of the locking is controlled within the PDW code before the request even gets sent to the SQL Server. The same result happens, though, and you cannot always use the same techniques as you would in SQL Server to mitigate it (blocking detection, managed lock priority, etc). You need to look for the process to be in a queued status within PDW. You can see this easily in the sys.dm_pdw_lock_waits DMV.

To show demonstrate, I created a partitioned table from the FactInternetSales table in AdventureworksPDW2012 as well as an empty partition aligned table to swap partitions:

CREATE TABLE  FactInternetSale_PartitionSwapTest
WITH
(
    CLUSTERED COLUMNSTORE INDEX,
    DISTRIBUTION = HASH(ProductKey),
    PARTITION
    (
        OrderDateKey RANGE RIGHT FOR VALUES
        (
        20000101,20010101,20020101,20030101,20040101,20050101,20060101,20070101,20080101,20090101,
        20100101,20110101,20120101,20130101,20140101,20150101,20160101,20170101,20180101,20190101,
        20200101,20210101,20220101,20230101,20240101,20250101,20260101,20270101,20280101,20290101
        )
    )
)
AS SELECT *
 FROM FactInternetSales;
 
 
 
 --create shadw table for partition swap
 
 CREATE TABLE [dbo].[FactInternetSale_PartitionSwapTest_AUX] (
    [ProductKey] int NOT NULL, 
    [OrderDateKey] int NOT NULL, 
    [DueDateKey] int NOT NULL, 
    [ShipDateKey] int NOT NULL, 
    [CustomerKey] int NOT NULL, 
    [PromotionKey] int NOT NULL, 
    [CurrencyKey] int NOT NULL, 
    [SalesTerritoryKey] int NOT NULL, 
    [SalesOrderNumber] nvarchar(20) COLLATE Latin1_General_100_CI_AS_KS_WS NOT NULL, 
    [SalesOrderLineNumber] tinyint NOT NULL, 
    [RevisionNumber] tinyint NOT NULL, 
    [OrderQuantity] smallint NOT NULL, 
    [UnitPrice] money NOT NULL, 
    [ExtendedAmount] money NOT NULL, 
    [UnitPriceDiscountPct] float NOT NULL, 
    [DiscountAmount] float NOT NULL, 
    [ProductStandardCost] money NOT NULL, 
    [TotalProductCost] money NOT NULL, 
    [SalesAmount] money NOT NULL, 
    [TaxAmt] money NOT NULL, 
    [Freight] money NOT NULL, 
    [CarrierTrackingNumber] nvarchar(25) COLLATE Latin1_General_100_CI_AS_KS_WS NULL, 
    [CustomerPONumber] nvarchar(25) COLLATE Latin1_General_100_CI_AS_KS_WS NULL
)
WITH (CLUSTERED COLUMNSTORE INDEX, DISTRIBUTION = HASH([ProductKey]),  PARTITION ([OrderDateKey] RANGE RIGHT FOR VALUES (20000101, 20010101, 20020101, 
20030101, 20040101, 20050101, 20060101, 20070101, 20080101, 20090101, 20100101, 20110101, 20120101, 20130101, 20140101, 20150101, 20160101, 20170101,
 20180101, 20190101, 20200101, 20210101, 20220101, 20230101, 20240101, 20250101, 20260101, 20270101, 20280101, 20290101)));

To create the blocking situation, in another session, I ran the following to keep the select active:

BEGIN TRANSACTION 
SELECT *
FROM FactInternetSale_PartitionSwapTest

Then, in another session, run the ALTER TABLE Statement. It will just run indefinitely:

ALTER TABLE FactInternetSale_PartitionSwapTest
	SWITCH PARTITION 26 to FactInternetSale_PartitionSwapTest_AUX
	PARTITION 26

You can see that it is in a queue'd status by querying the sys.dm_pdw_waits DMV:

SELECT * FROM sys.dm_pdw_waits WHERE state='queued'

How can I tell what is causing it to be queued? You can join back to the DMV in order to get the blocker and waiter information. In this query I also filtered on the waiter containing SWITCH within the command to capture only queued sessions that are performing partition switch operations:

SELECT Q.object_name ObjectName
	,Q.session_id AS QueuedSession
	,Q.request_id QueuedQID
	,datediff(ms, Q.request_time, getdate()) AS WaitTimeMS
	,QR.command AS QueuedCommand
	,B.session_id AS BlockerSession
	,B.request_id BlockerQID
	,B.type AS BlockerLockType
	,BR.Total_elapsed_time
	,BR.Start_time
	,BR.End_time
	,BR.command AS BlockerCommand
FROM sys.dm_pdw_lock_waits Q
INNER JOIN sys.dm_pdw_lock_waits B ON Q.object_name = B.object_name
INNER JOIN sys.dm_pdw_exec_requests QR ON Q.request_id = QR.request_id
INNER JOIN sys.dm_pdw_exec_requests BR ON B.request_id = BR.request_id
WHERE Q.STATE = 'Queued'
	AND B.STATE = 'Granted'
	AND Q.Type = 'Exclusive'
	AND QR.command LIKE '%SWITCH%'

Now, if you know that you want to perform some action, you can programmatically do that. In this case, I am going to automatically kill any sessions that have been running over 5 minutes that are blocking my partition swap:

CREATE TABLE #BlockedXLocks
	WITH (
			DISTRIBUTION = ROUND_ROBIN
			,LOCATION = USER_DB
			) AS

SELECT Q.object_name ObjectName
	,Q.session_id AS QueuedSession
	,Q.request_id QueuedQID
	,datediff(ms, Q.request_time, getdate()) AS WaitTimeMS
	,QR.command AS QueuedCommand
	,B.session_id AS BlockerSession
	,B.request_id BlockerQID
	,B.type AS BlockerLockType
	,BR.Total_elapsed_time
	,BR.Start_time
	,BR.End_time
	,BR.command AS BlockerCommand
FROM sys.dm_pdw_lock_waits Q
INNER JOIN sys.dm_pdw_lock_waits B ON Q.object_name = B.object_name
INNER JOIN sys.dm_pdw_exec_requests QR ON Q.request_id = QR.request_id
INNER JOIN sys.dm_pdw_exec_requests BR ON B.request_id = BR.request_id
WHERE Q.STATE = 'Queued'
	AND B.STATE = 'Granted'
	AND Q.Type = 'Exclusive'
	AND QR.command LIKE '%SWITCH%'
	--AND CRITERIA TO KILL - WAIT Time, etc) 
	--and BR.Total_elapsed_time > 300000 -- the select has been running for more than 5 min BUT this does NOT work in a case of a transaction
	AND datediff(mi, BR.Start_Time, getdate()) >= 5

WHILE (
		(
			SELECT count(1)
			FROM #BlockedXLocks
			) > 0
		)
BEGIN
	DECLARE @BlockerSID NVARCHAR(10) = (
			SELECT TOP 1 BlockerSession
			FROM #BlockedXLocks
			)
	DECLARE @sql_code NVARCHAR(1000) = ('kill ''' + @BlockerSID + '''')

	--print @sql_code
	EXEC sp_executeSQL @sql_code

	DELETE
	FROM #BlockedXLocks
	WHERE BlockerSession = @BlockerSID
END

Now you have a way to keep long running reports from blocking your load processes that are trying to switch in new partitions into your fact table. Typically, the threshold would be longer than 5 minutes, or you may want to filter on a certain user account (non service account possibly), etc. There are a ton of options to help you decide what is a session that should be killed and what shouldn't.

Updated Apr 28, 2020
Version 3.0

1 Comment

  • dscaravaggi's avatar
    dscaravaggi
    Copper Contributor

    I'm using plain sql2019 not PDW but the result is equal a Sch-M Lock is raised shortly, even if no DDL is affecting the partition switch, I had to surround try/error logic with:

    WITH (WAIT_AT_LOW_PRIORITY (MAX_DURATION = 5 MINUTES,ABORT_AFTER_WAIT = SELF))

    why not ask for a change in next CU/release ? 

    1. evalualte changes in partitions

    2. raise Sch-M only if DDL changes are required