Blog Post

SQL Server Blog
8 MIN READ

Query plan regressions got you down? Here's how can Automatic Plan Correction turns it around.

theSQLSith's avatar
theSQLSith
Icon for Microsoft rankMicrosoft
Apr 28, 2026

Automatic Plan Correction (APC) is one of the features that I find myself talking about quite a bit with customers, support engineers, and the broader SQL community. It is part of the Automatic Tuning family and has been quietly doing its job since SQL Server 2017, detecting query plan regressions and automatically forcing a previously known good plan to restore performance. But one of the questions that comes up often is: how does APC actually decide whether a regression has occurred? And how confident is it in that decision?

With SQL Server 2022 CU4 and continuing into SQL Server 2025, we have made significant improvements to the statistical model that APC uses for regression detection. This blog post walks through what changed, why it matters, and how you can take advantage of it.

 

⚙️ Two Ways to Detect a Regression

APC supports two statistical models for determining whether a query plan has regressed. Both analyze execution statistics captured by Query Store (such as CPU time) to compare the performance of a current plan against a previously known good plan.

The original model: sigma-based comparison

The original regression detection model uses a sigma-based comparison (commonly associated with the "three-sigma rule"). It computes whether the difference in average CPU time between the current plan and the last known good plan exceeds a threshold of 3 standard deviations.

This approach works well when execution times follow a normal distribution with consistent variance. However, we found some limitations with this approach over time:

  • High variance can mask regressions. If CPU times vary widely for a given plan (which is common in real workloads), the standard deviation becomes large. A regression where the mean CPU time doubles might still fall within the 3-sigma band, causing APC to miss it.
  • It assumes equal variance. The model treats the variance of both the current and baseline plan populations as comparable. When they differ significantly (for example, the regressed plan has highly variable execution times while the good plan was stable), the comparison loses precision.
  • It requires more executions. The model needs at least 15 executions of both the current plan and the baseline plan before it makes a determination.

 

🆕 The newer model: Welch's t-test

The newer regression detection model uses the Welch's t-test, a well-established statistical hypothesis test designed specifically for comparing two populations that may have unequal variances and unequal sample sizes. If you are not familiar with the Welch's t-test, you can think of it as a more robust way to answer the question: "are these two sets of numbers actually different, or is the difference just noise?"

Key improvements over the original model:

  • Better accuracy with high-variance workloads. The Welch's t-test explicitly accounts for different variances between the current plan and the baseline plan. It won't be fooled by a plan that sometimes runs fast and sometimes runs very slow. It can still detect that the distribution has shifted unfavorably.
  • Faster reaction time. APC can begin evaluating a plan after as few as 3 executions of the new plan, compared to 15 with the original model. When the performance difference is statistically significant, APC can detect and correct a regression much sooner.
  • Adaptive sample sizing. If the initial sample of 3 executions produces an "uncertain" result, APC doesn't give up. It progressively increases the number of executions required before rechecking, until enough execution statistics have been gathered to make a confident determination.
  • Multiple checks before final determination. The Welch's t-test model performs more than one check over the lifecycle of a potential regression. It re-evaluates as more data arrives, reducing the chance of both false positives and false negatives.

With the Welch's t-test model, we have significantly improved both the accuracy of regression detection and the reaction time that APC takes with correcting a regression.

 

🧭 Enabling the Welch's t-test model

The Welch's t-test model is the default regression detection model in:

  • Azure SQL Database, SQL Database in Microsoft Fabric, Azure SQL Managed Instance (enabled by default, no action required)
  • SQL Server 2025 (enabled by default, no action required)

For SQL Server 2022, enable it with trace flag 12618 (available starting with CU4):

-- Enable globally (startup trace flag)
DBCC TRACEON (12618, -1);

-- Or add as a startup parameter: -T12618

The original sigma-based model remains as a fallback and is used when trace flag 12618 is not enabled on SQL Server 2022 CU4+. As always, we strongly recommend applying the latest Cumulative Updates for SQL Server.

 

🧪 Handling long-running and timed-out queries

Both regression detection models share a common bias that I want to call out: APC checks are triggered after a certain number of query executions complete. This means APC inherently evaluates the fastest-completing executions first and may miss regressions where the symptom is queries running for minutes or timing out entirely.

Let me paint a picture of what this looks like. Imagine a situation where the overall CPU utilization on a server goes from an average of 10% to 100% unexpectedly and you find out that there had been a plan change for a particular query. The extended events may show that the query had executed a few times quickly, say 15 executions that completed in milliseconds. However, during your analysis you notice that there were a lot of other query executions that were running longer, maybe for a few minutes. Some of those longer running queries completed successfully, but others were timing out. By the time those slow executions finish and their statistics reach Query Store, APC has already made its regression decision based on the fast executions and concluded there was no regression.

 

Trace flag 12656 addresses this by enabling a time-based delayed recheck. When enabled, APC schedules an additional check 5 minutes after a plan change is first detected. This delayed check re-evaluates the query using execution statistics that now include the slower and timed-out executions, providing a more complete picture.

-- Enable globally
DBCC TRACEON (12656, -1);

Like trace flag 12618, trace flag 12656 is also available starting with SQL Server 2022 CU4.

Per-query timed recheck with sp_configure_automatic_tuning

While trace flag 12656 applies this recheck globally, you can also enable it on a per-query basis using the sp_configure_automatic_tuning stored procedure with the FORCE_LAST_GOOD_PLAN_EXTENDED_CHECK option. This is especially useful in Azure SQL Database, SQL Database in Microsoft Fabric, and Azure SQL Managed Instance where the global trace flag is not available. It provides the same 5-minute delayed recheck that trace flag 12656 enables, but scoped to only the queries you choose:

-- Enable timed recheck for a specific query
EXEC sp_configure_automatic_tuning
    @option = 'FORCE_LAST_GOOD_PLAN_EXTENDED_CHECK',
    @option_value = 'ON',
    @type = 'QUERY',
    @type_value = <query_id>;

-- Disable timed recheck for a specific query
EXEC sp_configure_automatic_tuning
    @option = 'FORCE_LAST_GOOD_PLAN_EXTENDED_CHECK',
    @option_value = 'OFF',
    @type = 'QUERY',
    @type_value = <query_id>;

This allows you to target specific queries that are known to have long-running or timeout-prone execution patterns without imposing the overhead of timed rechecks on every query in the database. The following table summarizes the scope differences:

Method

Scope

Where available

Trace flag 12656

All queries on the instance (global)

SQL Server 2022 CU4 and later versions (including SQL Server 2025 and later versions)

sp_configure_automatic_tuning

FORCE_LAST_GOOD_PLAN_EXTENDED_CHECK

Individual queries (per-query)

SQL Server 2022 CU4 and later versions, SQL Server 2025, Azure SQL Database, SQL Database in Microsoft Fabric, and Azure SQL Managed Instance

 

🔍 Monitoring APC decisions

The illustration below can be used as a mental map of all of the phases that the automatic plan correction feature takes which can be monitored.

sys.dm_db_tuning_recommendations

The sys.dm_db_tuning_recommendations DMV can be used to see any actions that APC has taken if it is enabled and will also be populated if the feature is disabled, showing the action that it would have taken if it were enabled. The result set is in JSON format, so a query similar to the one below can be used to parse the output:

SELECT
    JSON_VALUE([state], '$.currentValue') AS [state],
    script = JSON_VALUE(details, '$.implementationDetails.script'),
    planForceDetails.*
FROM sys.dm_db_tuning_recommendations
CROSS APPLY OPENJSON(Details, '$.planForceDetails')
    WITH (
        [query_id] int '$.queryId',
        regressedPlanId int '$.regressedPlanId',
        recommendedPlanId int '$.recommendedPlanId',
        regressedPlanErrorCount int,
        recommendedPlanErrorCount int,
        regressedPlanExecutionCount int,
        regressedPlanCpuTimeAverage float,
        recommendedPlanExecutionCount int,
        recommendedPlanCpuTimeAverage float
    ) AS planForceDetails;

 

⚠️ Note: Data in sys.dm_db_tuning_recommendations is not persisted across Database Engine restarts.

 

📋 Extended events for deeper analysis

It may also be a good idea to capture some of the APC extended events for additional insight and analysis. For general APC observability, the key events are:

Extended event

When it fires

automatic_tuning_plan_regression_detection_check_completed

APC finishes evaluating a plan for regression. Shows whether a regression was detected and corrected.

automatic_tuning_plan_regression_verification_check_completed

APC finishes validating a previously forced plan.

automatic_tuning_check_abandoned

APC abandons a check (for example, only one plan exists).

automatic_tuning_recommendation_expired

APC backs off from a forced plan that isn't helping.

automatic_tuning_diagnostics

Periodic health summary with check counts, detection counts, and correction counts.

For analyzing the Welch's t-test model specifically, there is a dedicated event (not currently available for Azure SQL Database and SQL Database in Microsoft Fabric):

Extended event

When it fires

automatic_tuning_wtest_details

Fires during the Welch's t-test evaluation with detailed statistical information about the test.

 

⚠️ Note: The automatic_tuning_wtest_details event can be chattier than the other events. It is one way to validate the effects of having trace flag 12618 enabled if you are interested in comparing results in a workload between the old and new regression detection models, but consider the overhead before enabling it in production. Additionally, this event is not available in database-scoped sessions in Azure SQL Database for SQL Database in Microsoft Fabric.

Here is a sample script to create a session that captures the key events:

CREATE EVENT SESSION [APC_Welch_Monitor] ON SERVER
ADD EVENT qds.automatic_tuning_plan_regression_detection_check_completed (
    ACTION (sqlserver.database_id, sqlserver.database_name)
),
ADD EVENT qds.automatic_tuning_plan_regression_verification_check_completed (
    ACTION (sqlserver.database_id, sqlserver.database_name)
),
ADD EVENT qds.automatic_tuning_wtest_details (
    ACTION (sqlserver.database_id, sqlserver.database_name)
),
ADD EVENT qds.automatic_tuning_diagnostics (
    ACTION (sqlserver.database_id, sqlserver.database_name)
)
ADD TARGET package0.event_file (
    SET filename = N'APC_Welch_Monitor.xel',
        max_file_size = 50,
        max_rollover_files = 5
)
WITH (
    MAX_MEMORY = 4096 KB,
    EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS,
    MAX_DISPATCH_LATENCY = 5 SECONDS,
    STARTUP_STATE = OFF
);
GO

ALTER EVENT SESSION [APC_Welch_Monitor] ON SERVER STATE = START;

 

📊 Summary

The table below provides a quick comparison between the two regression detection models. If you are on SQL Server 2022, I would encourage you to try trace flag 12618 and see how it improves APC's responsiveness for your workloads. If you are on SQL Server 2025 or Azure SQL Database SQL Database in Microsoft Fabric, or Azure SQL Managed Instance, the Welch's t-test model is already active by default.

Regression detection models

Aspect

Original model (sigma-based)

Welch's t-test model

Statistical method

3-sigma threshold on mean CPU difference

Welch's t-test (unequal variance t-test)

Handles unequal variance

No (assumes similar variance)

Yes (explicitly models different variances)

Min executions to evaluate

15 (current plan)

As few as 3 (current plan), adapts exponentially

Adaptive sample sizing

No

Yes (progressively increases required executions on uncertain results)

Multiple checks

Single check

Multiple re-evaluations as data accumulates

Default in Azure SQL DB, SQL Database in Microsoft Fabric, Azure SQL Managed Instance

No (fallback)

Yes

Enable on SQL Server

Default on SQL Server 2022

Default on SQL Server 2025; trace flag 12618 required for SQL Server 2022 CU4+

 

As we continue to make Automatic Plan Correction more reliable and more robust, I hope you find this blog helpful. We are always looking to hear from the community, so please continue to share your feedback and experiences at https://aka.ms/sqlfeedback.

 

📚 Learn More

Updated Apr 28, 2026
Version 1.0
No CommentsBe the first to comment