A compressed rowgroup is considered as fragmented when any of the following two conditions is met
Less than 1 million rows but the
( please refer to
) is other than DICTIONARY_SIZE. If the number of rows in the compressed rowgroup is lower that 1 million because it has reached the maximum dictionary size, then nothing can be done to increase the number of rows.
It has nonzero deleted rows that exceeds a minimum threshold.
When REORGANNIZE command runs, it applies the following policy for each of the eligible compressed rowgroups
Combine one or more compressed rowgroups such that total number of rows <= 1,024,576. For example, if you bulk import 5 batches of size 102400, you will get 5 compressed rowgroups. Now if you run REORGANIZE command, these rowgroups will get merged into 1 compressed rowgroup of size 512000 rows assuming there were no dictionary size or memory limitation.
When > 102400 rows are marked deleted, a compressed RG is eligible for self-merge. For example, if a compressed row group of 1,048,576 million rows has 110k rows deleted, we can remove the deleted rows and recompress the rowgroup with the remaining rows. It saves on the storage by removing deleted rows
If there is a choice between self-merge or merging two or more rowgroups, the merging of two or more rowgroups is favored. For example, if there are two compressed rowgroups RG1 with 500k rows and RG2 with 1,048,576 rows but 60% of the rows are deleted. In this case, instead of self-merging RG2, the merge policy will combine RG1 and RG2 into one compressed rowgroup
Here are some examples
The compressed rowgroup has 10% of 1048576 rows marked as deleted, it qualifies for self-merge to remove the deleted rows. If only say 50k rows were marked deleted, it won't be eligible for self-merge.
If a compressed rowgroup has < 90% of the 1048576 rows after accounting for deleted rows, it qualifies for the merge. Here are some examples
No (don’t qualify for merge)
No (we can’t exceed 1 million in the merge size)
1 million (200k deleted)
500k (300k deleted)
Yes, 1 million
If you have three rowgroups that qualify for merge, they are considered for merge in sequential order. For example, rowwgroup1 (100k), rowgroup2 (850k) and rowgroup3 (350k) - we will merge the first two qualifying ones.
This merge policy is just a heuristic and we will evolve it as our experience with customer workload grows.