Blog Post

Azure Database Support Blog
3 MIN READ

Geo‑Replication Redo Lag in Azure SQL Database

Mohamed_Baioumy_MSFT's avatar
Jan 02, 2026

Monitoring and Troubleshooting Using Public DMVs

Azure SQL Database provides built‑in high availability and geo‑replication capabilities to ensure database resilience and business continuity. While replication is fully managed by the platform, customers may occasionally observe a delay between the primary and geo‑replicated secondary database, especially during periods of heavy write activity.

 

This article provides a public, supported approach to monitoring and understanding geo‑replication delay caused by redo lag, using official Dynamic Management Views (DMVs) and T‑SQL only, without relying on internal tools.

Scenario Overview

Customers may report that changes committed on the primary database are not immediately visible on the geo‑replicated secondary, sometimes for several minutes.

Typical symptoms include:

  • Reporting queries on the geo‑secondary showing stale data
  • Increased redo catch‑up time on the secondary
  • No performance degradation observed on the primary
  • Replication eventually catches up without manual intervention

This behavior is commonly associated with redo lag, where the secondary has already received the log records but requires additional time to replay them into data pages.

Understanding Geo‑Replication Redo Lag

In Azure SQL Database geo‑replication:

  • Transactions are sent from the primary to the secondary
  • Log records are hardened on the secondary
  • The secondary applies these records asynchronously to its data files (redo)

If the secondary experiences temporary pressure (for example, due to schema changes or intensive operations), redo may fall behind, causing the secondary to lag—even though data durability remains intact.

Step 1: Check Geo‑Replication Status and Lag

The primary DMV for monitoring geo‑replication is:

📘 sys.dm_geo_replication_link_status
Public documentation:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database 

Sample T‑SQL Query

-- Run on primary DB
SELECT   
     link_guid  
   , partner_server  
   , last_replication  
   , replication_lag_sec   
FROM sys.dm_geo_replication_link_status;

Key Columns Explained

ColumnDescription
replication_state_descCurrent replication state
replication_lag_secEstimated lag (in seconds)
last_replicationLast successful replication timestamp (UTC)
partner_serverGeo‑replica logical server

Interpretation

  • Healthy replication:
    replication_lag_sec = 0 and state is healthy
  • Transient delay:
    Lag increases temporarily but later returns to zero
  • Sustained delay:
    Lag remains elevated for an extended period and should be investigated further

Step 2: Monitor Local Replica Redo Health

To understand whether lag is related to redo activity on the secondary, you can query:

📘 sys.dm_database_replica_states
Public documentation:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database 

Sample T‑SQL Query

-- Run on primary DB
SELECT
    database_id,
    synchronization_state_desc,
    synchronization_health_desc,
    redo_queue_size,
    redo_rate,
    last_commit_time
from sys.dm_database_replica_states

Key Insights

  • redo_queue_size
    Indicates how much log data is pending replay on the secondary
  • redo_rate
    Shows how quickly redo is being applied
  • last_commit_time
    Helps estimate data freshness on the secondary (UTC)

Interpretation

ScenarioObservation
Normal operationredo_queue_size = 0
Write burstredo_queue_size increases temporarily
Recovery in progressredo_rate remains positive
Healthy statesynchronization_health_desc = HEALTHY

Short‑lived spikes are expected platform behavior and usually resolve automatically.

Practical Monitoring Considerations

Reporting Workloads

If applications read from the geo‑secondary for reporting:

  • Expect near‑real‑time, not guaranteed real‑time visibility
  • Design reports to tolerate small delays
  • Route latency‑sensitive reads to the primary if required

Transaction Patterns

Redo lag is more likely during:

  • Large batch updates
  • Index maintenance operations
  • Schema modification commands
  • Bursty write workloads

Keeping transactions short and efficient reduces replication pressure.

Best Practices

  • Use UTC timestamps consistently when correlating events
  • Monitor replication_lag_sec and redo_queue_size together
  • Implement retry logic in applications for transient conditions
  • Avoid assuming read replicas are always perfectly synchronized
  • Do not take manual actions during short‑lived redo spikes unless the lag persists

Summary

Redo lag in Azure SQL Database geo‑replication is a normal and self‑healing behavior during certain workload patterns. By using supported public DMVs and T‑SQL, customers can:

  • Monitor replication health safely
  • Understand replica freshness
  • Make informed application routing decisions
  • Avoid unnecessary intervention

Azure SQL Database automatically stabilizes replication once redo pressure subsides, ensuring durability and availability without manual management.

References

 

Updated Jan 15, 2026
Version 2.0

3 Comments

  • m60freeman's avatar
    m60freeman
    Brass Contributor

    Both example queries contain invalid column names. Please clarify whether the queries can or should be run on the primary or secondary.

    • Mohamed_Baioumy_MSFT's avatar
      Mohamed_Baioumy_MSFT
      Icon for Microsoft rankMicrosoft

      Thank you for the careful review—you're absolutely right. I’ve updated the queries to reference the correct column, and confirmed that both should be executed on the primary database. Please take another look and let me know if there’s anything else you’d like me to adjust.

      • m60freeman's avatar
        m60freeman
        Brass Contributor

        For the first query, I would use:

        -- Run on primary DB
        SELECT 
            link_guid
          , partner_server
          , partner_database
          , replication_state_desc
          , replication_lag_sec
          , last_replication
        FROM sys.dm_geo_replication_link_status;

        I think the additional data is helpful. Thanks for updating the article so quickly!