Performance improvement by orders of magnitude when merging partitions in SQL Server 2008R2
Published Mar 23 2019 11:25 AM 901 Views
Microsoft
First published on MSDN on Feb 03, 2010

I am publishing this Blog on behalf of Hermann Daeubler, our SAP expert


Scenario:


Let’s assume we have a table consisting of five partitions and the one in the middle needs to  be dropped. In SQL Server we need the following steps to accomplish this task


a)      Create a non-partitioned target table with the same structure and indexes as the partitioned table


b)       Use the switch command to move the data from the middle partition to the new target table. Now the original partition is empty but still visible and the old boundary value isn’t removed


c)       Complete the task by altering the partition function by merging the empty partition


Problem:
The partition switch command is a metadata-only operation and is frequently used by customers
to add (assuming staging table has already been populated) or remove partition data almost instantaneously. This is a significant improvement over non-partitioned tables where these operations can take time proportional to the size of the data.

However, there are issues with partition merging where depending on the hardware and the size
of the partitions, the step “c)” can still take much longer than expected especially when merging partitions in the middle of a table.


For details on partitions, please refer to
http://msdn.microsoft.com/en-us/library/dd578580.aspx
http://msdn.microsoft.com/en-us/library/ms190787.aspx



Why partition merge is slow?


It is a common scenario to put partitions on different file groups. The decision was made to make the
partition function merge a deterministic function. This means that there is a fixed rule which defines
on which file group the merged partition will end up. As a result the merge will be always in the same
direction corresponding to the type (right or left). This approach has the downside that there is no
optimization related to the size of the partitions. In a scenario where the source partition has 10 million rows and the destination partition has only 1 million rows, SQL Server would therefore move
the 10 million rows of the source partition rather than moving the 1 million rows from the destination partition. The same would happen if the destination partition is completely empty as in
the sample above where one would like to drop a partition.



There is a workaround for this when the partitions to be merged are mapped to the same filegroup. In the example above where the intention is to drop a partition it is clear that the destination partition is always empty.  The workaround is to


·         Switch both partitions out – the source as well as the destination partition


·         Merge the partition function


·         Switch back in the source partition.


But this workaround has a side-effect. It temporarily removes data from the table and an application using this approach has to provide a proper cleanup or rollback in case something goes wrong


Solution with the changes in SQL Server 2008R2:


CU6 for SQL Server 2008 SP1 includes a fix for this potential issue if all partitions reside on the same
file group. It checks which partition has less rows and moves the data from this one during a partition
function merge. This also covers the special case where one partition is empty.
In case the performance of dropping partitions in the middle of a table is essential for an application
it’s necessary to place all partitions on one single file group to take advantage of the fix in CU6.


Results :


The screenshots below and the SQL Script further down use a simple example to show how the
workaround basically could be implemented. It also shows the difference after installing CU6 for
SQL Server 2008 SP1.



Figure 1 above shows that a test table was created with five partitions and five million rows in each
partition.



Figure 2 above shows that the standard way to get rid of a partition took over 2 minutes on the test system.
There were obviously millions of logical reads involved due to the move of the data during
the partition function merge




Figure 3 above shows that afterwards boundary value 3 is missing as expected and four partitions are left




Figure 4 above shows the result of the workaround. Due to the additional switch more tables are involved.
But overall we see just a few logical reads and the whole process is done within a second
compared to over 2 minutes before



Figure 5 above shows the output of the standard method after installing CU6 for SQL Server 2008 SP1.
There is only one switch necessary and it finishes like the workaround before in a second.



To run the SQL Script below, you need to follow the following steps in this sequence


·         First create the four procedures


·         Run the procedures one after the other as seen at the end


·          Create the test table as it’s necessary to specify how many rows should be generated.   It’s a
very basic algorithm which simply doubles the number of records 10 times.  A parameter of
value 25000 results in 25 million rows – five million rows per partition.



-- Procedure to create a partitioned test table


CREATE PROCEDURE create_partition_function_merge_test_tables ( @nr_rows INT )
AS
BEGIN


DECLARE @counter int


SET NOCOUNT ON


IF EXISTS( select * from sys.objects
WHERE name ='partition_function_merge_test_table' and type = 'U' )
DROP TABLE partition_function_merge_test_table


IF EXISTS( select * from sys.partition_schemes
WHERE name = 'PARTITION_FUNCTION_MERGE_TEST_SCHEME' )
DROP PARTITION SCHEME [PARTITION_FUNCTION_MERGE_TEST_SCHEME]


IF EXISTS( select * from sys.partition_functions
WHERE name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' )
DROP PARTITION FUNCTION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION]



CREATE PARTITION FUNCTION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION] ( int )
AS
RANGE RIGHT FOR VALUES ( 1,2,3,4,5 )


CREATE PARTITION SCHEME [PARTITION_FUNCTION_MERGE_TEST_SCHEME]
AS PARTITION [PARTITION_FUNCTION_MERGE_TEST_FUNCTION]
ALL TO ( [PRIMARY] )



CREATE TABLE partition_function_merge_test_table
(
key_partkey  int NOT NULL,
key_col1     int,
key_col2     int,
data_col1    char(15),
data_col2    char(15),

data_col3    char(15)
) ON [PARTITION_FUNCTION_MERGE_TEST_SCHEME] ( key_partkey )


CREATE CLUSTERED INDEX PFM_IDX1 on partition_function_merge_test_table ( key_partkey )


SET @counter = @nr_rows


WHILE ( @counter > 0 )
BEGIN


INSERT INTO partition_function_merge_test_table VALUES
( @counter%5,@counter,@counter,
'Testdata Col1','Testdata Col2','Testdata Col3' )


SET @counter = @counter – 1
END



SET @counter = 10


WHILE ( @counter > 0 )


BEGIN


INSERT INTO partition_function_merge_test_table
SELECT * from partition_function_merge_test_table


SET @counter = @counter – 1
END



CREATE INDEX PFM_IDX2 on partition_function_merge_test_table ( key_col1 )
CREATE INDEX PFM_IDX3 on partition_function_merge_test_table ( key_col2 )
CHECKPOINT


END
GO


-- Procedure to drop the partition in the middle via switch and merge command


CREATE PROCEDURE drop_partition_3_slow
AS
BEGIN


IF ( OBJECT_ID('temp_partition') IS NOT NULL )
BEGIN
DROP TABLE temp_partition;
END



CREATE TABLE temp_partition
(   key_partkey  int NOT NULL,
key_col1     int,
key_col2     int,
data_col1    char(15),
data_col2    char(15),
data_col3    char(15)
)



CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition ( key_partkey )
CREATE INDEX PFM_IDX2 on temp_partition ( key_col1 )
CREATE INDEX PFM_IDX3 on temp_partition ( key_col2 )


SET STATISTICS IO ON


ALTER TABLE partition_function_merge_test_table switch PARTITION 3 TO temp_partition
ALTER PARTITION FUNCTION PARTITION_FUNCTION_MERGE_TEST_FUNCTION ()
MERGE RANGE (3)


SET STATISTICS IO OFF


END
GO



-- Procedure to drop the partition in the middle via switch and merge command.
-- This time it will also switch out the "neighbor" before the merge and switch
-- it back in after the merge. This will avoid an unnecessary copy of the rows
-- in the neighbor partition


CREATE PROCEDURE drop_partition_3_fast
AS
BEGIN


IF ( OBJECT_ID('temp_partition') IS NOT NULL )
BEGIN
DROP TABLE temp_partition;
END


IF ( OBJECT_ID('temp_partition2') IS NOT NULL )
BEGIN
DROP TABLE temp_partition2;
END



CREATE TABLE temp_partition
(   key_partkey  int NOT NULL,
key_col1     int,
key_col2     int,
data_col1    char(15),
data_col2    char(15),
data_col3    char(15)
)


CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition ( key_partkey )
CREATE INDEX PFM_IDX2 on temp_partition ( key_col1 )
CREATE INDEX PFM_IDX3 on temp_partition ( key_col2 )



CREATE TABLE temp_partition2
(   key_partkey  int NOT NULL,
key_col1     int,
key_col2     int,
data_col1    char(15),
data_col2    char(15),
data_col3    char(15)
)


CREATE CLUSTERED INDEX PFM_IDX1 on temp_partition2 ( key_partkey )
CREATE INDEX PFM_IDX2 on temp_partition2 ( key_col1 )
CREATE INDEX PFM_IDX3 on temp_partition2 ( key_col2 )


SET STATISTICS IO ON


ALTER TABLE partition_function_merge_test_table switch PARTITION 3 TO temp_partition
ALTER TABLE partition_function_merge_test_table switch PARTITION 4 TO temp_partition2


ALTER PARTITION FUNCTION PARTITION_FUNCTION_MERGE_TEST_FUNCTION ()
MERGE RANGE (3)


ALTER TABLE temp_partition2 ADD CONSTRAINT partCheck_constr CHECK ( key_partkey = 3 )
ALTER TABLE temp_partition2 switch TO partition_function_merge_test_table PARTITION 3


SET STATISTICS IO OFF


END
GO


-- Procedure to list the partitions of the test table


CREATE PROCEDURE list_partitions
AS
BEGIN



DECLARE @nr_partitions int
DECLARE @counter int
DECLARE @partrows int
DECLARE @boundary_value char(10)


SET @nr_partitions = ( SELECT COUNT(*) FROM sys.partition_range_values prv,
sys.partition_functions pf  WHERE pf.function_id = prv.function_id  AND
pf.name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' )


PRINT 'Partitions for table partition_function_merge_test_table :'


PRINT ''


PRINT 'nr | boundary value | # rows in partition'


SET @counter = 1


WHILE ( @counter <= @nr_partitions )
BEGIN


SET @boundary_value = CONVERT(CHAR(10),


(SELECT value FROM sys.partition_range_values prv, sys.partition_functions pf


WHERE pf.function_id = prv.function_id AND pf.name = 'PARTITION_FUNCTION_MERGE_TEST_FUNCTION' AND


boundary_id = @counter ))



SET @partrows=(SELECT p.rows FROM sys.partitions p, sys.objects o
WHERE p.object_id = o.object_id AND o.name = 'partition_function_merge_test_table' AND
p.partition_number = @counter AND index_id IN (0,1) )


PRINT ' ' + rtrim(convert(char(5),@counter)) + ' |' +
'              ' + rtrim(@boundary_value) + ' |' +
'           ' + convert(char(20), @partrows)


SET @counter = @counter + 1


END
END
GO



-- Repro to show the performance impact of different ways to drop a partition.
-- The procedure to drop the partition will turn "statistics io" on to prove
-- by the number of logical reads on the test table that in the first case
-- data will be moved whereas in the second case it won't.
-- This will change by installing CU6 for SQL Server 2008 SP1 which fixes this
-- behavior. While the runtime of the workaround was always the same the standard
-- approach to drop a partition in the middle of a table improved from over 2 minutes
-- to 1 second after installing CU6 for SQL Server 2008 SP1 – as shown on the
-- screenshots above. The difference is of course dependent on the hardware
-- configuration and might be even more or also less on other systems.



execute create_partition_function_merge_test_tables 25000
execute list_partitions
execute drop_partition_3_slow
execute list_partitions


execute create_partition_function_merge_test_tables 25000
execute list_partitions
execute drop_partition_3_fast
execute list_partitions


Version history
Last update:
‎Mar 23 2019 11:25 AM
Updated by: