Geo‑Replication Redo Lag in Azure SQL Database

Microsoft

Jan 02, 2026

Monitoring and Troubleshooting Using Public DMVs

Azure SQL Database provides built‑in high availability and geo‑replication capabilities to ensure database resilience and business continuity. While replication is fully managed by the platform, customers may occasionally observe a delay between the primary and geo‑replicated secondary database, especially during periods of heavy write activity.

This article provides a public, supported approach to monitoring and understanding geo‑replication delay caused by redo lag, using official Dynamic Management Views (DMVs) and T‑SQL only, without relying on internal tools.

Scenario Overview

Customers may report that changes committed on the primary database are not immediately visible on the geo‑replicated secondary, sometimes for several minutes.

Typical symptoms include:

Reporting queries on the geo‑secondary showing stale data
Increased redo catch‑up time on the secondary
No performance degradation observed on the primary
Replication eventually catches up without manual intervention

This behavior is commonly associated with redo lag, where the secondary has already received the log records but requires additional time to replay them into data pages.

Understanding Geo‑Replication Redo Lag

In Azure SQL Database geo‑replication:

Transactions are sent from the primary to the secondary
Log records are hardened on the secondary
The secondary applies these records asynchronously to its data files (redo)

If the secondary experiences temporary pressure (for example, due to schema changes or intensive operations), redo may fall behind, causing the secondary to lag—even though data durability remains intact.

Step 1: Check Geo‑Replication Status and Lag

The primary DMV for monitoring geo‑replication is:

📘 sys.dm_geo_replication_link_status
Public documentation:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database

Sample T‑SQL Query

-- Run on primary DB
SELECT   
     link_guid  
   , partner_server  
   , last_replication  
   , replication_lag_sec   
FROM sys.dm_geo_replication_link_status;

Key Columns Explained

Column	Description
replication_state_desc	Current replication state
replication_lag_sec	Estimated lag (in seconds)
last_replication	Last successful replication timestamp (UTC)
partner_server	Geo‑replica logical server

Interpretation

Healthy replication:
replication_lag_sec = 0 and state is healthy
Transient delay:
Lag increases temporarily but later returns to zero
Sustained delay:
Lag remains elevated for an extended period and should be investigated further

Step 2: Monitor Local Replica Redo Health

To understand whether lag is related to redo activity on the secondary, you can query:

📘 sys.dm_database_replica_states
Public documentation:
https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database

Sample T‑SQL Query

-- Run on primary DB
SELECT
    database_id,
    synchronization_state_desc,
    synchronization_health_desc,
    redo_queue_size,
    redo_rate,
    last_commit_time
from sys.dm_database_replica_states

Key Insights

redo_queue_size
Indicates how much log data is pending replay on the secondary
redo_rate
Shows how quickly redo is being applied
last_commit_time
Helps estimate data freshness on the secondary (UTC)

Interpretation

Scenario	Observation
Normal operation	redo_queue_size = 0
Write burst	redo_queue_size increases temporarily
Recovery in progress	redo_rate remains positive
Healthy state	synchronization_health_desc = HEALTHY

Short‑lived spikes are expected platform behavior and usually resolve automatically.

Practical Monitoring Considerations

Reporting Workloads

If applications read from the geo‑secondary for reporting:

Expect near‑real‑time, not guaranteed real‑time visibility
Design reports to tolerate small delays
Route latency‑sensitive reads to the primary if required

Transaction Patterns

Redo lag is more likely during:

Large batch updates
Index maintenance operations
Schema modification commands
Bursty write workloads

Keeping transactions short and efficient reduces replication pressure.

Best Practices

Use UTC timestamps consistently when correlating events
Monitor replication_lag_sec and redo_queue_size together
Implement retry logic in applications for transient conditions
Avoid assuming read replicas are always perfectly synchronized
Do not take manual actions during short‑lived redo spikes unless the lag persists

Summary

Redo lag in Azure SQL Database geo‑replication is a normal and self‑healing behavior during certain workload patterns. By using supported public DMVs and T‑SQL, customers can: