When we are analyzing execution plans, we may come across different types of Spool operators - Table Spools, Row Count Spools, Window Spools or Index Spools – that the Query Optimizer chooses for specific purposes. In this post we are going to briefly look into the Index Spool, how it can sometimes lead to suboptimal query performance, and what can be done to easily fix it.
In loose terms, an Index Spool is a narrower indexed copy of a table, that is internally created on the tempdb to help handling repetitive operations. The interesting pattern regarding Index Spools, which we will further analyze in this post, is that they are often a sign that we lack an index to properly support our query. To overcome this, an Index Spool might be internally used, and, despite the possibility of this not representing a performance issue on certain workloads, it is important to be aware of some of its characteristics to have a better understanding of why it can affect performance:
The operation requires a serial zone in the execution plan. Despite the reading operations from the Index Spool can be parallelized, its creation will be single-threaded.
Index Spools are (re)created in each execution. This might result in high resource consumption and tempdb pressure.
Index Spools will not trigger a missing index hint.
This doesn’t necessarily mean that Index Spools are intrinsically bad or that we will always need to create a new index when we see one. But it’s important to understand why it can sometimes be a bottleneck in the execution plan and what can be done to improve performance when we identify it as such.
As an example of such scenario, let’s consider the following query and the resultant execution plan below:
FROM Sales.SalesOrderDetail sod
SELECT MAX(OrderTrackingId) LastTrack
From Sales.OrderTracking t
WHERE sod.SalesOrderID = t.SalesOrderID
AND sod.ModifiedDate < t.EventDateTime
WHERE ModifiedDate > '2013-01-01 00:00:00.0000000'
On the execution plan it’s possible to identify that the Index Scan plus the Index Spool represent almost all the query execution time. Below the execution plan itself, we can see three grids with the query execution stats and relevant information specific to the Index Spool:
On the first table on the left, we have the query IO and time statistics, that will be relevant for further comparison:
Elapsed time (ms)
2 470 348
On the middle, we have information about the Rebinds and Rewinds, from the Index Spool properties. Rewinds represent the number of times where data was reused, as opposed to Rebinds where new data needed to be fetched. This ratio of 4 to 83 532 shows that the Spool was actually quite efficient in minimizing the need for subsequent calls, nevertheless, and as previously stated, it is noticeable that this was the most expensive operator in the Execution Plan.
The last grid on the right shows the data distribution across worker threads during the Index Spool creation, which confirms that it will be built using a single thread.
Fixing the Spool
As stated earlier, one interesting pattern regarding Index Spools is that they often cover a missing index to efficiently support the underlying query. If we look further at the Index Spool properties of the same execution plan shown above, we can easily identify the relevant columns for the index we need to create (note that some properties were omitted on the image below):
We can notice a Seek Predicate on the Keys SalesOrderID and EventDateTime, and also an Output List that includes OrderTrackingID. Mapping them to the index we need to create is very straightforward:
The Seek Keys represent index key columns.
The columns in the Output List are the non-key columns that we may consider to add using INCLUDE.
In this specific case, we don’t need to explicitly include the OrderTrackingID column since it is the Clustered Index of the Sales.OrderTracking table and it will be implicitly included.
With that we can create a new index, as it follows:
CREATE INDEX IX_SalesOrderID_EventDateTime on sales.OrderTracking (SalesOrderID,EventDateTime)
After executing the same query again, we will be able to confirm that the Index Spool is gone:
Below the execution plan, the grid on the right show us that this time the data was parallelized across the available threads.
And the IO and time statistics evidence a significantly better performance. Logical reads improved 89.11%, CPU and elapsed time improved 80.09% and 92.69% respectively: