While the table is partitioned, we may want to change the way it is partitioned when building the new index. For example, by using the same partition function and scheme, the new index can be partitioned on different columns than the original table.
Create table t (c1 int, c2 int) on ps(c2)
Create clustered Index idx_t on t(c1) on ps(c1)
The serial plan looks like follows.
NL (Nested Loop)
CTS is the Constant Table Scan. It scans each partition one by one and provides partition ID to the inner side (the lower part in graphics showplan) of NL. The inner side of NL scans the corresponding partition and sends data to the sort iterator. From there, it is exactly the same as source non-partitioned scenario. Not surprisingly, the memory and disk space requirement is the same too.
In the case of parallel plan, we have
X (Distribute Streams)
X (Repartition Streams)
The operator above CTS is a Gather Streams operator, meaning it has one producer and many consumers. There is no parallelism below this operator. Between the Gather Streams and the Repartition Streams, each worker is assigned (Number of Source's Partitions)/(Degree of Parallelism) number of source partitions. The source is only scanned once in total.
The Repartition Streams operator splits the query plan into two parallelism sections. Between the top-level Distribute Streams and the Repartition Streams, we have a different set of workers than the worker set below Repartition Streams. Each worker is assigned (Number of Target Index Partitions)/(Degree of Parallelism) target partitions. The target index partition information is pushed down to the Repartition Streams which redirects data to different sort tables based on target partition location. The rest, i.e. how the sort and index building works, is the same as the parallel plan in the source non-partitioned case. Again, the memory and disk space requirement is the same as in the source non-partitioned case.