Blog Post

Azure Database Support Blog
5 MIN READ

Troubleshooting Change Tracking cleanup growth and orphaned rows in Azure SQL Database

Mohamed_Baioumy_MSFT's avatar
Feb 27, 2026

Applies to: Azure SQL Database
Scenario: Change Tracking (CT) side tables grow unexpectedly, and “orphaned” rows appear after switching between auto-cleanup and custom/scheduled manual cleanup.

The problem (what we observed)

In the case tracked, the discussion focused on Change Tracking cleanup behavior—including unexpected growth in CT side tables and orphaned records. The customer also referenced earlier guidance to move away from auto-cleanup due to locking concerns during upgrades, and the team needed to propose safe next steps quickly. 

A parallel concern was that CT auto-cleanup could block DDL during upgrades (schema lock behavior), which triggered work to deploy a fix and validate it in a lab before broader rollout.

Why this is tricky

The engagement highlighted that manual cleanup and auto-cleanup can behave differently in real-world, high-scale environments (large number of CT-enabled tables, heavy activity, and operational constraints like access and auditing). Investigation efforts included:

  • validating where orphaned rows exist and how many CT side tables are affected,
  • checking whether auto-cleanup is enabled/disabled, and
  • using auditing / Extended Events to identify who/what is dropping related history objects.

Additionally, Snapshot Isolation can prevent cleanup from progressing in some cases. it was noted that long-running snapshot transactions can prevent a safe cleanup point from advancing, which can block removal of expired entries from internal commit tracking tables until those transactions complete.

Practical troubleshooting steps (what helped)

1) Confirm CT configuration (retention + auto-cleanup)

Use the Change Tracking configuration options to validate retention and whether auto-cleanup is enabled. Microsoft Learn documents enabling CT at the database level (including CHANGE_RETENTION and AUTO_CLEANUP). Enable and Disable Change Tracking - SQL Server | Microsoft Learn

2) Quick backlog signal: commit table “oldest commit_time”

During the investigation, the team used a lightweight query to sanity-check backlog in the commit table:

SELECT TOP (1) *
FROM sys.dm_tran_commit_table
ORDER BY commit_time ASC;

if the returned commit_time is close to the retention horizon, auto-cleanup is likely keeping up (this query doesn’t require DAC).

3) Detect orphaned rows in CT side tables (read-only script)

A key artifact from this case is the following T-SQL script, which calculates a cleanup point based on configured retention and then iterates over CT side tables (sys.internal_tables where internal_type = 209) to identify rows whose sys_change_xdes_id no longer has a matching entry in sys.syscommittab at/below the cleanup point. 

-- use <[DBName]> -- switch to the right database


SET NOCOUNT ON


-- find the invalid clean version based on configured retention
DECLARE time DATETIME, @csn BIGINT = 0, @minCleanupPoint BIGINT = 0
DECLARE @retention_period INT, @retention_period_units NVARCHAR(10)
SELECT @retention_period = retention_period,
@retention_period_units = retention_period_units
FROM sys.change_tracking_databases where database_id = DB_ID()
SELECT time = CASE WHEN @retention_period_units = 1 then DATEADD(minute, (-1 * @retention_period), GETUTCDATE())
WHEN @retention_period_units = 2 then DATEADD(hour, (-1 * @retention_period), GETUTCDATE())
ELSE DATEADD(day, (-1 * @retention_period), GETUTCDATE()) END

EXEC sp_changetracking_time_to_csn time = time, @csn = @csn OUTPUT


SELECT @minCleanupPoint = @csn
SELECT @minCleanupPoint as minCsn -- 688118


-- iterate over all the change tracking side tables
DECLARE @sideTable SYSNAME;
DECLARE ct_cursor CURSOR FAST_FORWARD FOR
SELECT name FROM sys.internal_tables WHERE internal_type = 209; -- internal_type = 209 is for change tracking side tables


OPEN ct_cursor;
FETCH NEXT FROM ct_cursor INTO @sideTable;


WHILE @@FETCH_STATUS = 0
BEGIN
-- find the minimum expired xdes id
declare @minXdesId BIGINT
SELECT @minXdesId = min(xdes_id) FROM sys.dm_tran_commit_table where commit_ts <= @minCleanupPoint
-- SELECT @minXdesId as minXdes

-- create temp table for storing orphaned xdes id
DROP TABLE IF EXISTS #OrphanedXdes;
    CREATE TABLE #OrphanedXdes
    (
        sys_change_xdes_id BIGINT NOT NULL
    );


DECLARE @sql NVARCHAR(MAX);
SET @sql = N'
    INSERT INTO #OrphanedXdes(sys_change_xdes_id)
    SELECT ct.sys_change_xdes_id
    FROM sys.' + QUOTENAME(@sideTable) + N' AS ct
    WHERE ct.sys_change_xdes_id < @minXdesId
      AND NOT EXISTS
      (
          SELECT 1
          FROM sys.syscommittab AS s
          WHERE s.xdes_id = ct.sys_change_xdes_id AND s.commit_ts <= @minCleanupPoint
      );';


    EXEC sys.sp_executesql
        @sql,
        N'@minXdesId BIGINT, @minCleanupPoint BIGINT',
        @minXdesId = @minXdesId,
@minCleanupPoint = @minCleanupPoint;


DECLARE @orphanedIdsCount BIGINT;
SET @sql = N'
SELECT @cnt = COUNT_BIG(sys_change_xdes_id)
FROM #OrphanedXdes;
';


EXEC sys.sp_executesql
@sql,
N'@cnt BIGINT OUTPUT',
@cnt = @orphanedIdsCount OUTPUT;


-- Raise error if any orphaned xdes exists
IF (@orphanedIdsCount > 0)
BEGIN
DECLARE @msg NVARCHAR(4000) =
@sideTable + N' : ' + CONVERT(NVARCHAR(30), @orphanedIdsCount);
RAISERROR (@msg, 16, 1) WITH NOWAIT;

DECLARE @newLine NVARCHAR(10) = CHAR(13) + CHAR(10)
PRINT (@newLine)
END


DROP TABLE IF EXISTS #OrphanedXdes;


    FETCH NEXT FROM ct_cursor INTO @sideTable;
END


CLOSE ct_cursor;
DEALLOCATE ct_cursor;


SET NOCOUNT OFF

Here’s the high-level logic (excerpted/annotated from the script):

  • Read retention settings from sys.change_tracking_databases
  • Convert “retention window” to a cleanup CSN using sp_changetracking_time_to_csn
  • For each CT side table (sys.internal_tables internal_type = 209):
    • compare side-table sys_change_xdes_id vs. sys.syscommittab and count “orphaned” xdes ids
    • emit a message when orphaned counts are present 

Tip: This is read-only diagnostic logic. In your environment, validate permissions and impact before running in production.

4) If auto-cleanup is unexpectedly disabled, re-enable and monitor

In the email thread, the team observed auto-cleanup was disabled in at least one environment and recommended re-enabling it, then monitoring the CT history table to confirm cleanup activity resumes. 

5) Use auditing / Extended Events to identify unexpected object drops

When investigating why a “history table” disappeared, the team reviewed extended event data and noted evidence of a specific application context associated with the drop (shared in the meeting discussion). 

This is a key lesson: without auditing, it can be difficult to determine who/what disabled auto-cleanup or dropped relevant objects; the email thread explicitly called this out.

Mitigation options discussed

(often safest): disable & re-enable Change Tracking on affected tables

As an alternative to running manual deletion scripts, the troubleshooting recommended disabling and re-enabling Change Tracking on the set of tables containing orphaned rows—described as a well-established and safer cleanup method that avoids needing elevated access to run cleanup scripts directly. 

Trade-off: disabling CT on a table removes existing change data from the corresponding side tables for that table. 

About auto-cleanup performance improvements (why “stay on auto” may be preferred)

The troubleshooting included discussion that auto-cleanup is the area that continues to receive improvements and that performance enhancements exist (for example, improved adaptive behavior in newer SQL Server versions). Microsoft Learn describes that SQL Server 2025 introduces an “adaptive shallow cleanup approach” for large side tables, enabled by default, and explains how cleanup behavior changes compared to prior versions. 

While Azure SQL Database implementation details differ from boxed SQL Server, the key operational takeaway from the discussion was: if possible, prefer auto-cleanup and avoid manual cleanup unless you have a strong reason, because manual cleanup may lack telemetry and can be harder to reason about at scale.

Key takeaways

  1. Start with read-only validation: use a backlog signal query and an orphan-detection script to quantify scope before making changes. 
  2. Auditability matters: without auditing/trace evidence, identifying who disabled auto-cleanup or dropped related objects is difficult. 
  3. Snapshot Isolation can block cleanup progress: long-running snapshot transactions may prevent safe cleanup from advancing. Keep snapshot transactions short where possible. 
  4. A safe mitigation exists disable/re-enable Change Tracking on affected tables (with awareness of change data loss) can be safer than running deletion scripts.

References

Published Feb 27, 2026
Version 1.0
No CommentsBe the first to comment