First published on MSDN on Sep 03, 2013
Something that is a relatively common performance eye opener is when you have a large ad hoc workload, and you’re getting a huge gap in the lack of plan reuse. You talk to the application team that is responsible for this possibly dreaded scenario and relay your concerns about the high CPU caused by the ad hoc workload compilation rate, and with the plan cache bloat you would really like to see a better display of client-side parameterization. We’ve all been there before, and we are all too familiar with the response that the application code will not be changing. Work your magic on the database.
A commonly accepted action to resolve this plan reuse problem is to set a database to force parameterization . What this ultimately does is tell SQL Server to parameterize all DML queries (with a list of limitations that can be found in the above link). Take for example, you have a complex ad hoc query that uses up to 100 distinct parameter values. With simple parameterization, that could lead to 100 different cached plans. But by forcing parameterization that could be narrowed down to a single prepared plan that is parameterized. You think, awesome, problem solved! Maybe. Maybe not.
By forcing parameterization and reusing complex queries’ plans you could potentially be hurting performance. In order to illustrate this pitfall, I have provided an example below. I have grown my AdventureWorks2012 database using a script provided by my good friend and SQL professional Adam Machanic (Thinking Big) . Take the following query, something that may be used for reporting:
USE AdventureWorks2012; GO SELECT p.ProductID ,p.Name AS Product ,th.ActualCost ,th.Quantity ,pm.Name AS ProductModel FROM dbo.bigTransactionHistory th INNER JOIN dbo.bigProduct p ON th.ProductID = p.ProductID LEFT JOIN Production.ProductModel pm ON p.ProductModelID = pm.ProductModelID WHERE p.ProductID < < ProductID VALUE HERE >;
As you can see in the above query, a user could filter by a range of ProductIDs. Take a scenario where plan reuse (or the lack thereof) is causing plan cache bloat and a high rate of compilations. So you set parameterization to be forced for the AdventureWorks2012 database:
ALTER DATABASE AdventureWorks2012 SET PARAMETERIZATION FORCED; GO
Now say the first query for the cold cache is the following, specifying the ProductID to be a value of 1002:
SELECT p.ProductID ,p.Name AS Product ,th.ActualCost ,th.Quantity ,pm.Name AS ProductModel FROM dbo.bigTransactionHistory th INNER JOIN dbo.bigProduct p ON th.ProductID = p.ProductID LEFT JOIN Production.ProductModel pm ON p.ProductModelID = pm.ProductModelID WHERE p.ProductID < 1002;
We get the following post execution plan:
All looks well. But the problem comes into play now when we execute with a different, larger value (50532):
SELECT p.ProductID ,p.Name AS Product ,th.ActualCost ,th.Quantity ,pm.Name AS ProductModel FROM dbo.bigTransactionHistory th INNER JOIN dbo.bigProduct p ON th.ProductID = p.ProductID LEFT JOIN Production.ProductModel pm ON p.ProductModelID = pm.ProductModelID WHERE p.ProductID < 50532;
We can see by looking at the plan cache that the desired plan reuse has been obtained, by using the below query:
SELECT cp.objtype ,cp.cacheobjtype ,cp.usecounts ,st.TEXT ,qp.query_plan FROM sys.dm_exec_cached_plans cp OUTER APPLY sys.dm_exec_sql_text(cp.plan_handle) st OUTER APPLY sys.dm_exec_query_plan(cp.plan_handle) qp WHERE st.TEXT LIKE '%bigTransactionHistory%' AND st.TEXT NOT LIKE '%dm_exec_cached_plans%';
You will notice, as a side note, that there are still two Adhoc plans for this query. These are simply shell plans that point to the prepared plan that has now been parameterized. You can tell these are shell plans by looking at the query plan’s XML, and you’ll only see a few elements there, and only the StmtSimple element containing data. The data that’s here includes a ParameterizePlanHandle and ParameterizedText attributes that point to the prepared plan including full query plan details. Think of these shell plans as pointers to the real plan.
Going back to the current post execution plan for parameter value 50532, you will be sorely disappointed:
As you can see above, because the initial plan was reused, the estimated number of rows was a little over 2000, but the actual number of rows was over 30 million! Taking a look at the XML of this post execution plan, we can see the following difference between the compiled value and the runtime value, explaining why this plan is poorly executing for this run: ( for further information regarding parameter sniffing, please see Lisa Gardner’s blog post on Back to Basics: SQL Parameter Sniffing due to Data Skews )
<ParameterList> <ColumnReference Column="@0" ParameterCompiledValue="(1002)" ParameterRuntimeValue="(50532)" /> </ParameterList>
There’s a chance this is not the optimal plan when users specify the upper bounds of ProductIDs. For argument’s sake, let’s switch back to simple parameterization and execute the two queries again (ProductID equal to 1002 and 50532):
Looking at these plans side by side, we see that two relatively different plans were chosen for the different ProductID values (1002 is the top plan, 50532 the bottom). And glancing at the plan cache we can further see that instead of preparing a single plan, the two full Adhoc plans were compiled and cached:
Looking at the query stats for the same query (passing parameter value 50532, utilizing forced parameterization for the first test and simple parameterization for the second test) during the different runs, we can see numerical proof that with simple parameterization (and in turn an ad hoc plan) gives us better execution statistics at the cost of no plan reuse:
When you are dealing with these types of scenarios where plan cache bloat and large amounts of compilations due to ad hoc workloads are becoming issues, before you flip a switch like forced parameterization you need to do extensive testing and analysis. In the end, fixing certain problems can lead to more disastrous ones (in this case, poor plan usage and degraded execution time). Like many other things we do in our profession, this is one big “it depends”. I hope that this blog post has given some insight to a potential pitfall of forcing parameterization, and with proper planning you will be able to determine the best balance of plan reuse and best performance, even with an ad hoc workload.
Thomas Stringer – SQL Server Premier Field Engineer