This article describes the index rebuild process as well as index maintenance for clustered Columnstore indexes and is directed towards SQL 2014. In SQL 2016, there have been several index maintenance enhancements that will be covered in a separate post.
Overview of Columnstore index build or rebuild process
Building (or rebuilding) a Columnstore index can be a resource intensive operation. Index creation time can be 1.5 times longer than a regular b-tree and resource consumption as well as duration depends on a few factors other than physical resources that include
Plan for Index Rebuild of a Non-Partitioned Table with 6 billion rows:
Plan for Index Rebuild of a partitioned table with 6 billion rows spread across 22 partitions
The Columnstore index is built in 2 phases indicated by the plans above:
From the first segment build, we know how much memory was needed. Based on this knowledge we activate only N number of threads whose total memory grant will fit into the memory that was granted earlier. This number of threads activated is represented in the plan as “Effective Degree of Parallelism” and can be checked against the actual degree of parallelism as seen in the diagram.
The Memory Grant information is also available from the Query Plan.
During the life of the index build, additional memory can be granted within resource governor limits and low memory conditions are checked as well. If a Low memory condition is detected, the segment will be trimmed before it reaches the 1 million row mark.
A segment can be trimmed or closed before the 1 million mark from an index build perspective if
Columnstore Index Build Extended Events
There are a couple Extended Events that can help in diagnosing index build and segment quality related issues.
a. Column_store_index_build_throttle - indicates if the DOP has been throttled.
b. Column_store_index_build_low_memory – tells us if a segment is trimmed due to low memory condition
c. Column_store_index_build_process_segment – gives us the segment that was trimmed and the trim_reason. This list isn’t exhaustive, 1 = Low Memory, 2 = Dictionary full
Columnstore Index build test cases:
The following table depicts Index build results on a non-partitioned table with 6 billion rows. While observing the numbers we can see that the higher the DOP the higher the CPU. Also the higher the DOP, given more threads are spawned, the memory grant goes up.
Note: There wasn’t a noticeable difference when the same tests were performed on a partitioned table. The benefits of having Columnstore index on a partitioned table include being able to rebuild an index at the partition level, partition elimination in conjunction with segment elimination and ability to archive data efficiently.
This is a 6 billion row table that was tested, SQL Server has 60 cores.
4/10/15 8:02 AM
4/10/15 8:27 AM
4/10/15 8:28 AM
4/10/15 9:07 AM
4/10/15 9:08 AM
4/10/15 10:08 AM
4/10/15 10:09 AM
4/10/15 11:53 AM
If you look at the perfmon charts below and the timings, Rebuild index for a Columnstore index does not scale linearly. This is because the Global dictionary is built in serial as discussed, and is a constant irrespective of MAXDOP for the index build. As you see in the perfmon images below, the dictionary build time takes a larger percentage of the overall time the higher the degree of parallelism.
MAXDOP 32: Building the Global dictionary takes around 20% of the total index build time with MAXDOP 32.
MAXDOP 64: Building the Global dictionary takes around 29% of the total index build time when maxdop is 64.
Columnstore Index Maintenance
There are 2 operations from a Columnstore Index maintenance perspective
INDEX REORGANIZE : This manually moves closed ROWGROUPS into compressed columnar storage and this is done online. You do not have to do this, the tuple mover will ultimately move a closed rowgroup into columnar storage but the tuple mover is single threaded so issuing an index reorganize is a way to manually invoke compression on a closed row group.
INDEX REBUILD: This can be done at a partition Level for a partitioned table, and is an offline index build and at this point there isn’t an ONLINE equivalent. The Rebuild reads and recompresses all the data in the specified partition or the entire index. In an updatable Columnstore, deletes of data that reside in a compressed rowgroup are logical deletes. You can encounter situations where majority of the data in a rowgroup could be deleted. In such cases in order to reclaim that space, you have to REBUILD the index for the partition in question. Another reason one would want to rebuild an index on the partition is to improve rowgroup quality if you have a large number of rowgroups each having a small number of rows. In this case rebuilding an index can consolidate smaller row groups into larger ones which could help both from the compression perspective as well as from the query performance perspective.
Here is a sample script which can help identify partitions that are good candidates to be rebuilt based on some thresholds defined in the script.
Summarizing some of the best practices:
Program Manager SQL/Azure CAT
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.