First published on MSDN on Oct 19, 2018
Update (Nov 2022): An official t-shooting article with a bit more detail has now been created based on this blog post - Troubleshoot slow queries due to query optimizer timeout - SQL Server | Microsoft Learn
What Is Optimizer Timeout?
SQL Server uses a cost-based query optimizer. Therefore, it selects a query plan with the lowest cost after it has built and examined multiple query plans. One of the objectives of the SQL Server query optimizer (QO) is to spend a "reasonable time" in query optimization as compared to query execution. Therefore, QO has a built-in threshold of tasks to consider before it stops the optimization process. If this threshold is reached before QO has considered most, if not all, possible plans then it has reached the Optimizer TimeOut limit. An event is reported in the query plan as Time Out under "Reason For Early Termination of Statement Optimization." It's important to understand that this threshold isn't based on clock time but on number of possibilities considered. In current SQL QO versions, over a half million possibilities are considered before a time out is reached.
Optimizer timeout is designed in Microsoft SQL Server and in many cases encountering it is not a factor affecting query performance. However, in some cases the SQL query plan choice may be affected by optimizer timeout and thus performance could be impacted. When you encounter such issues, if you understand optimizer timeout mechanism and how complex queries can be affected in SQL Server, it can help you to better troubleshoot and improve your performance issue.
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.518" Build="13.0.5201.2" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="6" StatementEstRows="419.335" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="TimeOut" ......>
...
<Statements>
<Batch>
<BatchSequence>
The following is an example of a graphical representation of a plan, which displays a "TimeOut" value:
How does it work, then?
There's no simple formula to determine what conditions would cause the optimizer threshold to be reached or exceeded. However, the following are some factors that determine how many plans are explored by QO in the process of looking for a "best plan":
To illustrate, take an example of a join between 3 tables (T1, T2 and T3) and each table has a clustered index only. There are two joins involved here and because there are 3 physical join possibilities (NL, HM, MJ), then each of the two joins can be performed in 6 (2 * 3) ways. Also consider the join order:
Now multiply 6 ways * 3 join orders and we have a minimum of 18 possible plans to choose from. If you include the possibility of parallelism and other factors like Seek or Scan of the HoBT, then the possible plans increase even more. If you are a math wizard you can figure out that when a query involves for example 10 tables, the possible permutations there are in the millions. Therefore, you can see that a query with lots of joins is more likely to reach the optimizer timeout threshold than one with fewer joins.
Note The query predicates (filters in the WHERE clause) and existence of constraints will reduce the number of access methods considered and thus the possibilities considered.
The result of reaching the optimizer timeout threshold is that SQL Server has not considered the entire set of possibilities for optimization and it may have missed a set of plans that could produce shorter execution times. QO will stop at the threshold and consider the least-costly query plan at that point, although there may be better unexplored options. This may result in a query execution that's suboptimal.
But I see an Optimizer Timeout with a simpler query?
Nothing with QO is simple (black and white). There are so many possible scenarios, its complexity so high that it is hard to grasp all of the possibilities. The Query Optimizer may dynamically adjust/set timeout threshold based on the cost of the plan found at a certain stage. For example, if a plan that appears relatively "cheap" is found, then the task limit to search for a better plan may be reduced. Therefore, grossly underestimated cardinality estimation may be one example for hitting an optimizer timeout early. In this case, the focus of investigation is cardinality estimation. This is a rarer case than the scenario that's discussed previously about running a complex query, but it is possible.
So, what do I do?
You may have to do nothing. In many cases the plan you get is quite reasonable and the query you are running is performing well. But in the cases where you find the need to tune and optimize, consider the following options:
First, do this:
The following are different alternatives that you can explore to help improve the performance of the query. Again, be aware that the fact that an optimizer timeout is present in a query plan, does not necessarily mean that it's the reason for query slowness.
SELECT ...
FROM t1
JOIN t2
ON t1.id = t2.id
JOIN t3
ON t3.id = t2.id
JOIN t4
ON t4.id = t3.id
To optimize, try to break down into two queries:
SELECT...
INTO #temp1
FROM t1
JOIN t2
ON t1.id = t2.id
SELECT ...
FROM t3
JOIN #temp1
ON t3.id = #temp1.id
JOIN t4
ON t4.id = t3.id
Important: Using Multiple Common Table Expressions (CTEs) is not an appropriate solution to simplify a query. Multiple CTEs will only increase the complexity of the query. Therefore, it’s counterproductive. CTEs appear to break a query logically, but they are combined into a single query and optimized as a single large join of tables in the end.
Enjoy!
Joseph Pilov
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.