The type of parallel index build plan in SQL server depends on whether or not we have a histogram available with necessary statistics. Therefore, there are two broad categories of parallel index plans:
Histogram available (parallel sort and build):
| \ \
Builder… Build… Build… (write data to the in-build index)
| | |
Sort… Sort… Sort … (order by index key)
| / /
Scan (read data from source)
This type of Parallel Index build is getting chosen when we have statistics available (hence range partition information is available and can be used to identify data distribution).
How does scan happen in this case? We must have some statistics on the leading key column, so if we don’t have stats we will go ahead and generate sample statistics to determine whether and how to parallelize the index build operation. In some situations, however, we are not able to build sample stats, such as indexed view ("No stats plan"), and then different index build plan will be generated. Using the statistics and histogram we can identify data distribution (divide data in several buckets), so we can load balance the workload among workers in parallel plan, it also help us to make DOP (degree of parallelism) decision to achieve high utilization of system resource. Using the row count estimates from the histogram for each bucket in the distribution the workload is split into N ranges (N = DOP), one for each worker (this is an attempt to load balance the work among all workers).
Using range partition scan to scan data, each worker receives data belonging to its range and builds its own sort table and b-tree based on sort table, so each worker will have its own sort table and all the b-trees are disjoint. The coordinator thread will then stitch all the b-tree’s together at the end of index build operation and we build full statistics on the new b-tree and are done.
Parallel Index Build with histogram available can give us the best performance.
On the downside it is more memory consuming and will fail if there is not enough memory (because each worker creates #DOP sort tables). We can play with MAXDOP option to reduced to max number of DOP used in the index build and as a result – min memory required for the build. You can run sp_configure to figure out what is the default setting for ‘max degree of parallelism’ on the server. Max degree of parallelism = 0, means ‘uses the actual number of available CPUs depending on the current system workload’. You can explicitly limit the number of processors to use in parallel plan execution.
Create index idx_t on t(c1, c2)
WITH (MAXDOP = 2)
-- limit # of processor to use for index build to 2
Non stats plan (no histogram) index build plan