azure sql db

87 Topics

Cannot enable Change Data Capture (CDC) on Azure SQL Database: Msg 22830 + Error 40529 (SUSER_SNAME)
Issue While enabling CDC at the database level on Azure SQL Database using: EXEC sys.sp_cdc_enable_db; GO the operation fails. Error The customer observed the following failure when running sys.sp_cdc_enable_db: Msg 22830: Could not update the metadata that indicates the database is enabled for Change Data Capture. Failure occurred when executing drop user cdc Error 40529: "Built-in function 'SUSER_SNAME' in impersonation context is not supported in this version of SQL Server." What we checked (quick validation) Before applying any changes, we confirmed CDC wasn’t partially enabled and no CDC artifacts were created: -- Is CDC enabled for this database? SELECT name, is_cdc_enabled FROM sys.databases WHERE name = DB_NAME(); -- Does CDC schema exist? SELECT name FROM sys.schemas WHERE name = 'cdc'; -- Does CDC user exist? SELECT name FROM sys.database_principals WHERE name = 'cdc'; These checks were used during troubleshooting in the SR. (Also note: Microsoft Learn documents that enabling CDC creates the cdc schema/user and requires exclusive use of that schema/user.) Cause In this case, the failure aligned with a known Azure SQL Database CDC scenario: enabling CDC can fail if there is an active database-level trigger that calls SUSER_SNAME(). To identify active database-level triggers, we used: SELECT name, object_id FROM sys.triggers WHERE parent_class_desc = 'DATABASE' AND is_disabled = 0; Resolution / Workaround The customer resolved the issue by: Identifying the active database-level trigger. Disabling the trigger temporarily. Enabling CDC at the database level and then enabling CDC on the required tables. Re-enabling the trigger after CDC was successfully enabled. Post-resolution verification After enabling CDC, you can validate the state using: -- Confirm CDC enabled at DB level SELECT name, is_cdc_enabled FROM sys.databases WHERE name = DB_NAME(); And for table-level tracking, Microsoft Learn recommends checking the is_tracked_by_cdc column in sys.tables. Notes / Requirements To enable CDC for Azure SQL Database, db_owner is required. Azure SQL Database uses a CDC scheduler (instead of SQL Server Agent jobs) for capture/cleanup.
Mohamed_Baioumy_MSFT
Feb 27, 2026 Place Azure Database Support Blog
107Views
0likes
0Comments
DORA exit planning for Azure SQL Database: a practical, “general guidance” blueprint
Why this matters: Under the EU Digital Operational Resilience Act (DORA), many financial entities are strengthening requirements around ICT risk management, third‑party risk oversight, and—critically—exit planning / substitutability. Microsoft provides resources to help customers navigate DORA, including a DORA compliance hub in the Microsoft Trust Center. This post distills general guidance based on a real-world support thread where a customer requested a formal advisory describing an exit strategy for an Azure SQL Database workload (including a large database scenario). (Note: The content here is intentionally generalized and not legal advice—always align with your compliance team and regulators.) The customer asked Microsoft Support for a formal response to support DORA regulatory expectations, focusing on data portability, exit planning, and substitution capabilities for workloads running on Azure SQL Database. The support response framed the need as: a regulatory submission use case under DORA, where Microsoft can provide official references and describe the technical capabilities enabling portability and exit. while customers remain responsible for defining, documenting, testing, and periodically validating their exit procedures. Microsoft’s DORA resources: where to pull “regulatory artifacts” from A key part of the Support Request response was pointing to Microsoft’s formal compliance resources: Microsoft publishes DORA-related guidance and operational resilience materials via the Microsoft Trust Center and makes compliance documentation available via the Service Trust Portal for supervisory/audit processes. Microsoft also maintains a DORA compliance hub in the Trust Center aimed at helping financial institutions meet DORA requirements. Microsoft Learn provides an overview of DORA, scope, and key areas for customer consideration. Navigating DORA compliance | Microsoft Trust Center Practical takeaway: For DORA evidence packs, align your narrative to the regulator’s questions, and use Trust Center / Service Trust Portal materials as the “Microsoft-published” backbone, then attach your customer-owned exit runbooks and test evidence. Data ownership and portability: the foundation of an exit plan In the ticket’s advisory, Microsoft Support emphasized: Azure SQL Database is built on the SQL Server engine, and customers retain ownership of their data. The service supports portability through SQL Server–compatible schemas, T‑SQL, and documented export/restore mechanisms, reducing dependency on proprietary formats. How to use this in a DORA exit narrative: Frame “reversibility” as standards-based data and schema portability (SQL/T‑SQL + documented export/import). That’s exactly the type of substitutability narrative many regulators want to see. Supported exit strategy building blocks (Azure SQL Database → on‑prem SQL Server) The Support Request response described the exit approach at a high level, using supported, documented capabilities: Exporting database schema and data using SQL Server–compatible formats Restoring or importing into an on‑prem SQL Server environment with functional equivalence Maintaining security controls (auth, encryption in transit/at rest, integrity protections) during transition Validating restored data and application functionality as part of exit testing One concrete, Microsoft-documented portability method for Azure SQL Database is exporting to a BACPAC (schema + data), which can later be imported into SQL Server. BACPAC: what Microsoft documentation explicitly calls out (and why it matters for “exit planning”) Microsoft Learn documents: A BACPAC contains metadata and data and can be stored in Azure Blob storage or local storage and later imported into Azure SQL Database, Azure SQL Managed Instance, or SQL Server. Export a BACPAC File - SQL Server | Microsoft Learn For transactional consistency, ensure no write activity during export or export from a transactionally consistent copy. Blob-storage exports have a maximum BACPAC size of 200 GB; larger exports should go to local storage using SqlPackage. BACPAC is not intended as a backup/restore mechanism; Azure SQL has built-in automated backups. DORA relevance: BACPAC is a strong “portability evidence” artifact because it is explicitly positioned for “archiving” or “moving to another platform,” including SQL Server. sql-docs/azure-sql/database/database-export.md at live · MicrosoftDocs/sql-docs Large databases: why “one-button export” may not be your plan The Support Request thread highlighted a “large database” scenario and referenced that Microsoft documentation describes high-level migration patterns such as offline export/import and staged validation for large databases. In practice, if your database is far beyond BACPAC’s typical constraints (for example, BACPAC export to blob capped at 200 GB), your exit plan should explicitly describe: a staged approach (e.g., dry-run validation environment, phased cutover planning), capacity planning (network bandwidth, validation windows), and a testing cadence that produces regulator-friendly evidence. The ticket response also emphasized that customers should plan for sufficient time and capacity for transfer and validation (especially for large databases). Customer responsibilities under DORA (the part regulators care about most) A key statement from the Support Request advisory is worth repeating as general guidance: Microsoft provides the technical capabilities enabling data portability and exit, but customers remain responsible for defining, documenting, testing, and periodically validating exit procedures—including planning timelines, allocating sufficient capacity, executing test exits, and maintaining evidence for regulatory review. This aligns well with DORA’s intent and Microsoft’s broader DORA guidance narrative: DORA requires operational resilience outcomes, and organizations must integrate cloud capabilities into their governance and controls. A simple DORA-ready “exit plan checklist” you can adapt Below is a general checklist you can use to structure your exit plan documentation and evidence pack—aligned with what was emphasized in the Support Request: Scope & dependencies Identify the Azure SQL Database workloads, dependent applications, and data flows to be included in the exit plan. (Customer-owned documentation and evidence) Portability mechanism(s) Reference documented portability options such as schema+data export mechanisms (e.g., BACPAC) where applicable. Security controls during transition Document how auth and encryption controls are maintained during transfer and restoration validation. Validation plan Define how you will validate data integrity and application functionality in the target environment. Scale planning (large DBs) Document transfer capacity planning, timelines, and staged validation where needed. Evidence & audit trail Store test outputs, run logs, and references to Microsoft Trust Center / Service Trust Portal materials used in submissions. From the SR’s formal advisory perspective, the message is consistent: Azure SQL Database supports data portability and exits planning via SQL Server–compatible design and documented export/import mechanisms, Microsoft provides DORA-oriented compliance materials via the Trust Center and Service Trust Portal. and customers should own the exit runbook, testing, and evidence required for regulatory review.
Mohamed_Baioumy_MSFT
Feb 27, 2026 Place Azure Database Support Blog
135Views
0likes
0Comments
Alternatives After the Deprecation of the Azure SQL Migration Extension in Azure Data Studio
The Azure SQL Migration extension for Azure Data Studio is being deprecated and will be retired by February 28, 2026. As part of our unified and streamlined migration strategy for Azure SQL, we are consolidating all migration experiences into a consistent, scalable platform. If you are currently using the Azure SQL Migration extension, this blog will guide you through recommended replacement options for every phase of migration, whether you are moving to Azure SQL Managed Instance, SQL Server on Azure Virtual Machines, or Azure SQL Database. What is happening to the Azure SQL Migration extension in ADS? As you already know, Azure data studio will officially retire on February 28, 2026. The Azure SQL Migration extension in Azure Data Studio will also retire along with Azure Data Studio on February 28, 2026. The Azure SQL Migration extension will no longer be available in the marketplace of Azure Data Studio. What should you use instead? Below is the updated guidance for the migration tool categorized by migration phase and target. 1) Pre‑Migration: Discovery & Assessments Prior to migration, it is advisable to evaluate the SQL Server environment for readiness and to determine the right-sized Azure SQL SKU. Below are the recommended options: A) SQL Server enabled by Azure Arc Use the SQL Server migration experience in the Azure Arc portal for: Instance discovery at scale Migration assessments at scale, including: Readiness assessment for all Azure SQL targets. Performance-based, right-sized target recommendations. Projected Azure costs with the recommended target configuration. Reference: Steps to get started with the Azure Arc assessments- Deploy Azure Arc on your servers. SQL Server instances on Arc-enabled servers are automatically connected to Azure Arc. See options to optimize this. B) Automated assessments at scale using Azure DMS PowerShell and Azure CLI The Azure DataMigration modules in Azure PowerShell and Azure CLI can be used to automate assessments at scale. Learn more about how to do this. Here are the sample templates to automate the assessment workflow: Azure PowerShell DataMigration cmdlets DMS Azure CLI commands C) Azure Migrate For scenarios where assessments are required at data center level including different types of workloads like Applications, VM Servers and databases, use Azure Migrate to perform discovery and assessments at scale. Learn more about Azure Migrate. References: Review inventory Create SQL Assessment Review SQL Assessment 2) Migrations Based on the migration targets, here are the recommended tools you can use to carry out the migration: A. To Azure SQL Managed Instance The following options are available for migrating data to Azure SQL Managed Instance: 1. SQL Migration experience in Azure Arc For migrations to Azure SQL MI, leverage the streamlined SQL Migration experience in Azure Arc which lets you complete the end-to-end migration journey in a single experience. This experience provides: Evergreen assessments and right-fit Azure SQL target recommendation. Inline Azure SQL Target creation. Free Azure SQL MI Next generation General Purpose service that lets you experience the power of Azure SQL MI for free for 12 months. Near zero downtime migration using Managed Instance link powered by Distributed Availability Group technology. Secure connectivity. Reference blog: SQL Server migration in Azure Arc 2. Automated migration at scale using Azure DMS PowerShell and Azure CLI To Orchestrate migrations to Azure SQL MI at scale programmatically, use: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. B. To SQL Server on Azure Virtual Machines To migrate to SQL Server on Azure Virtual Machines, use: 1. Azure Database Migration Service (DMS) DMS supports migrating to SQL Server on Azure Virtual Machines using both online and offline methods. Your SQL Server backups can be in Azure Blob Storage or on a network SMB file share. For details on each option, see: Backups stored in Azure Blob Storage Backups maintained on network SMB file shares Note: The migration experience from SQL Server on-premises to SQL Server on Azure VM will soon be available in SQL Server enabled by Azure Arc. 2. Automated migration at scale using Azure DMS PowerShell and Azure CLI For programmatic migrations to Azure SQL Virtual Machines: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. 3. SSMS option: SQL Server Management Studio (SSMS) migration component If you can connect to both SQL Server on-premises and SQL Server running on Azure VM using SQL Server Management Studio, the migration component in SSMS can help you to migrate to SQL Server on Azure VM. For details, see SSMS Migration component. C. To Azure SQL Database Migrating a SQL Server database to Azure SQL Database typically involves migrating schema and data separately. Here are the options to perform offline and online migration to Azure SQL Database: 1. Offline migration to Azure SQL Database a. Azure Database Migration Service (DMS) portal experience Use Azure DMS portal to migrate both schema and data. Azure DMS uses Azure Data Factory and leverages the Self-hosted Integration Runtime (SHIR). Installation steps are here. b. Automated migration at scale using Azure DMS PowerShell and Azure CLI Use Azure DMS PowerShell and Azure CLI command line to orchestrate the schema and data migration to Azure SQL Database at scale: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. 2. Online migration to Azure SQL Database Using Striim To enable online migration of your mission critical databases to Azure SQL Database leverage Striim. Microsoft and Striim have entered a strategic partnership to enable continuous data replication from off-Azure SQL Servers to Azure SQL Database with near-zero downtime. For more details, refer to: Zero downtime migration from SQL Server to Azure SQL Database | Microsoft Community Hub Removing barriers to migrating databases to Azure with Striim’s Unlimited Database Migration program... To leverage the Striim program for migrations, please reach out to your Microsoft contact or submit the below feedback to get started. Summary The table below provides a summary of the available alternatives for each migration scenario. Migration Scenario Guided experience Automation experience Pre-Migration (Discovery + Assessment) SQL Migration experience in Azure Arc / Azure Migrate DMS PowerShell / Azure CLI To Azure SQL Managed Instance SQL Migration experience in Azure Arc DMS PowerShell / Azure CLI To SQL Server on Azure Virtual Machine DMS Azure Portal / SSMS migration component DMS PowerShell / Azure CLI To Azure SQL Database DMS Azure portal (offline & schema migration) / Striim (online migration) DMS PowerShell / Azure CLI (offline & schema migration) Final Thoughts Simplify your SQL migration journey and improve migration velocity to all Azure SQL targets, leverage the connected migration experiences in SQL Server enabled by Azure Arc, DMS, and SSMS. For SSMS, as a first step we brought the capabilities to perform assessment and migration to higher versions of SQL Server including to SQL Server on Azure Virtual Machines. As a next step, we are bringing cloud migration capabilities as well into SSMS. Feedback We love hearing from our customers. If you have feedback or suggestions for the product group, please use the following form: Feedback form As you begin your migration to Azure, we welcome your feedback. If you do not see suitable alternatives for any migration phases, use the feedback form to let us know so we can update the options accordingly.
abku14
Feb 03, 2026 Place Microsoft Data Migration Blog
1.2KViews
1like
0Comments
Multiple secondaries for Azure SQL Database failover groups (Public Preview)
Azure SQL Database failover groups are a business continuity capability that helps you manage geo-replication and coordinated failover of one or more databases from a primary Azure SQL logical server to a secondary logical server in another region. A key benefit is automatic endpoint redirection: your application can keep using the same listener endpoints, and connections are routed to the current primary after a geo-failover. Until recently, an Azure SQL Database failover group supported only one secondary server. Microsoft has now announced public preview support for up to four secondaries per failover group—unlocking more flexible DR designs, regional read scale-out patterns, and richer HA/BC architectures. Why this matters: the “one-secondary” constraint is gone Traditional failover group topologies were straightforward: one primary ↔ one secondary. That model is great for many DR plans, but it can be limiting when you want to: Distribute read-only workloads across multiple regions, Maintain multiple failover targets for broader regional resilience, Support complex compliance or geo-distribution requirements, or Migrate regions without sacrificing existing protection. With multiple secondaries, you can now design a failover group that better matches how globally distributed applications actually run—while still benefiting from the simplicity of the failover group abstraction and stable listener endpoints. What’s new in Public Preview With this enhancement, you can now: Create up to four secondary servers for a single failover group. Place these secondaries in the same or different Azure regions (with important guidance for read-only routing—covered below). Choose which secondary should receive traffic for the read-only listener endpoint (<fog-name>.secondary.database.windows.net). Fail over to any secondary (planned or forced), giving you more operational flexibility during incidents or DR drills. Supported service tiers for multiple secondaries (per the announcement): Standard, General Purpose, Premium, Business Critical, and Hyperscale. Core concept refresher: listener endpoints (and why they’re central) Failover groups expose two stable DNS-based listener endpoints: Read-write listener: <fog-name>.database.windows.net Read-only listener: <fog-name>.secondary.database.windows.net These endpoints are DNS CNAME records created when the failover group is created. After failover, DNS updates redirect the listener(s) to the new role holder. A few practical details that matter in real operations: The Learn documentation notes the DNS TTL is 30 seconds for primary and secondary listener records, which influences how quickly clients pick up endpoint changes (subject to client-side DNS caching behavior). For read-only workloads, it’s recommended to indicate read intent in the connection string using ApplicationIntent=ReadOnly. By default, failover of the read-only listener is disabled (to avoid impacting primary performance when the secondary is offline), which can be a key consideration for read-only availability expectations. Architecture: one failover group, up to four secondaries Here’s a simple way to visualize the topology: With multiple secondaries, you designate one secondary as the read-only listener endpoint target. All read-only listener traffic routes to that chosen secondary. If a failover group has only one secondary, the read-only endpoint defaults to it. How to get started (Azure portal) Creating the initial failover group is unchanged. To add additional secondaries, the announcement outlines these portal steps: Go to your Azure SQL logical server in the Azure portal. Open Failover groups under Data management. Select your existing failover group. Click Add server to add another secondary server. In the side panel, select: The secondary server to add, and The read-only listener endpoint target (a dropdown that lists existing and newly added secondaries). Click Select, then Save to apply the configuration—seeding begins for databases in the group to the new secondary. Important note from the announcement: if you intend to serve read workloads via the read-only listener, the designated read-only target should not be in the same region as the primary. PowerShell examples (create / modify / fail over) The announcement provides PowerShell examples to create and update a failover group with multiple secondaries: Create a failover group with multiple secondaries New-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "primaryserver" ` -PartnerServerName "secondaryserver1" ` -FailoverGroupName "myfailovergroup" ` -FailoverPolicy "Manual" ` -PartnerServerList @("secondary_uri_1","secondary_uri_2","secondary_uri_3","secondary_uri_4") ` -ReadOnlyEndpointTargetServer "secondary_uri_1" The secondary_uri_n values are Azure resource IDs in the form: /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Sql/servers/<server> Add/replace the secondary server list on an existing failover group Set-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "primaryserver" ` -FailoverGroupName "myfailovergroup" ` -FailoverPolicy "Manual" ` -PartnerServerList @("secondary_uri_1","secondary_uri_2","secondary_uri_3","secondary_uri_4") ` -ReadOnlyEndpointTargetServer "secondary_uri_1" Perform a failover Use Switch-AzSqlDatabaseFailoverGroup (example shown in the announcement): Switch-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "secondaryserver1" ` -FailoverGroupName "myfailovergroup" Planned vs. forced failover (operational clarity) The preview announcement highlights portal actions for: Failover (planned / synchronized) Forced failover (unplanned / potential data loss) The Learn documentation further clarifies that Failover performs full synchronization (no data loss) and requires the primary to be accessible, while Forced failover can result in data loss because replication is asynchronous and doesn’t wait for final propagation. Limitations and notes (preview behavior) From the announcement: Up to four secondaries per failover group. Each secondary must be on a different logical server than the primary. Secondaries can be in the same or different regions. The read-only listener endpoint target should be in a different region from the primary if you want to use the read-only listener for read workloads. Failover group name must be globally unique within .database.windows.net. Chaining (geo-replica of a geo-replica) isn’t supported. Backup storage redundancy and zone redundancy inheritance behaviors vary by tier; the announcement calls out: For non-Hyperscale, zone redundancy isn’t enabled by default on secondaries (enable after creation). For Hyperscale, secondaries inherit HA settings from their respective primaries. Best practices (practical guidance) The announcement and documentation recommend several patterns worth emphasizing: Prefer paired regions when feasible for better performance than unpaired regions. Test regularly using planned failovers (DR drills) so you validate your end-to-end plan. Monitor replication lag to ensure your RPO expectations are being met. Keep failover group scope reasonable (number of databases impacts failover duration; the documentation advises limiting databases per group and using multiple groups if needed). Use the right endpoint for the right workload: Writes → <fog-name>.database.windows.net (read-write listener) Reads → <fog-name>.secondary.database.windows.net (read-only listener) Monitoring example: sys.dm_geo_replication_link_status The DMV sys.dm_geo_replication_link_status returns a row per replication link and includes fields like partner_server, partner_database, last_replication, and replication_lag_sec, which you can use to track replication health. A simple query (run in the user database participating in geo-replication) is: SELECT partner_server, partner_database, last_replication, replication_lag_sec FROM sys.dm_geo_replication_link_status; (These columns and their meanings are documented in the DMV reference.) Closing thoughts Multiple secondaries for Azure SQL Database failover groups is a meaningful step forward for customers running globally distributed applications. It combines the operational simplicity of failover groups—stable listener endpoints and coordinated failover—with added architectural freedom: more DR targets, more read scale-out options, and smoother migration pathways. If you’re already using failover groups today, this preview is a great opportunity to revisit your topology and ask: Where would additional geo-secondaries reduce risk or improve performance for my read-heavy workloads? If you’re designing new, this capability expands what “good” looks like for resilient, regionally distributed data tiers. Learn more (official resources) Multiple secondaries for failover groups is now in public preview Failover groups overview & best practices (Azure SQL Database) Configure a failover group (Azure SQL Database) sys.dm_geo_replication_link_status DMV reference
Mohamed_Baioumy_MSFT
Jan 31, 2026 Place Azure Database Support Blog
460Views
0likes
0Comments
Enable auditing to identify Change Tracking history table drops using Extended Events (Azure SQL Db)
When Change Tracking is enabled, the system-maintained history table dbo.MSChange_tracking_history is a key place to look for cleanup progress and errors. For example, Microsoft’s troubleshooting guidance for Change Tracking auto-cleanup explicitly references querying the history table to validate whether cleanup is running and whether there are cleanup errors. In real-world environments, teams sometimes observe that dbo.MSChange_tracking_history “disappears” unexpectedly (for example, during cleanup troubleshooting), which then blocks visibility into cleanup progress and error history. When that happens, the most practical way to answer “who/what dropped the history table?” is to audit the DDL using Extended Events (XEvents)—a lightweight monitoring capability designed for troubleshooting. This post shows how to create a database-scoped XEvent session in Azure SQL Database that captures Change Tracking history table drops and related events, using the script you provided. Why Extended Events for this scenario? Extended Events are built to capture internal engine activities with relatively low overhead, and are supported across SQL Server and Azure SQL. In Azure SQL Database, XEvents are commonly used for investigation and troubleshooting when you need evidence of what happened and who initiated it. In this specific case, we want to capture: A history table drop event (Change Tracking-specific) A syscommittab truncate event (Change Tracking-specific) A generic object deletion event filtered to the history table name And we want to enrich each event with context (app, host, user, SQL text, stack, timestamps, etc.). The Extended Events session (copy/paste) Below is your script to create a database-scoped session named CTXEvent that audits the Change Tracking history table drop and related events: CREATE EVENT SESSION [CTXEvent] ON DATABASE ADD EVENT sqlserver.change_tracking_historytable_drop( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username)), ADD EVENT sqlserver.change_tracking_syscommittab_truncate( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username)), ADD EVENT sqlserver.object_deleted( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username) WHERE ([object_name]=N'MSchange_tracking_history')) WITH (MAX_MEMORY=16384 KB,EVENT_RETENTION_MODE=NO_EVENT_LOSS,MAX_DISPATCH_LATENCY=10 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF) GO Why these three events? sqlserver.change_tracking_historytable_drop: directly captures the Change Tracking “history table dropped” operation (purpose-built for this investigation). sqlserver.change_tracking_syscommittab_truncate: captures maintenance actions on syscommittab, which is central to Change Tracking’s internal state. sqlserver.object_deleted with a predicate on MSchange_tracking_history: provides a generic “safety net” if the drop manifests via generic DDL deletion paths. Why these actions? The actions you selected are the practical “who/what/where” context you usually need to attribute the operation: sqlserver.client_app_name / sqlserver.client_hostname: which app + which machine initiated the DDL sqlserver.username: identity executing the statement sqlserver.sql_text + sqlserver.tsql_stack: the statement and call stack (helpful if the drop occurs via a job, framework, or tooling) sqlserver.session_id / sqlserver.transaction_id: correlate activity across events TRACK_CAUSALITY=ON: enables correlation across related events within the session context. Starting and stopping the session Your session is created with STARTUP_STATE=OFF, meaning it does not start automatically. That’s typically a good practice for targeted investigations: enable it only during the window where you expect the issue. For Azure SQL Database, you can manage the session using SSMS or equivalent T-SQL workflows. Microsoft’s Extended Events QuickStart walks through creating sessions and working with session data, including the general workflow concepts. Tip (practical): Turn it on shortly before your expected “drop window,” reproduce/observe, then turn it off after you capture the event to minimize overhead. Where does the data go? Your script defines the events + actions + session options, but it does not define a target (for example, event_file). That’s OK for short, in-the-moment investigations, but for durable evidence and easy sharing, many teams prefer writing to an event file target. Option: Persist captured events to Azure Storage (recommended for multi-hour investigations) Microsoft provides a dedicated walkthrough for Azure SQL Database to create an event session with an event_file target stored in Azure Storage, including: creating a storage account + container granting access (RBAC or SAS) creating a database credential creating/starting the session viewing the event data If you choose this route, you can then read event files using sys.fn_xe_file_target_read_file (supported in Azure SQL Database, commonly used to consume XEL files from blob storage). How to interpret the results (what to look for) When the table drop happens, the captured event payload should allow you to confidently answer: Who initiated the operation? Look at sqlserver.username plus client_app_name and client_hostname. What executed the drop? Look at sql_text and tsql_stack. Was this tied to cleanup activity or another workflow? If you see change_tracking_syscommittab_truncate around the same time (and correlated via session/transaction/casuality identifiers), that can indicate the drop happened in proximity to Change Tracking maintenance behavior. Was the drop recurring or one-off? If you capture repeated events at a steady cadence, it points to an automated process (job, deployment pipeline, scheduled task, framework) rather than an ad-hoc human action. Why this matters: the history table is part of Change Tracking troubleshooting Microsoft’s Change Tracking cleanup troubleshooting guidance explicitly uses dbo.MSChange_tracking_history as the place to validate whether cleanup is running and whether errors are being logged. So if the history table is being dropped, you effectively lose an important “black box recorder” that you rely on to understand cleanup behavior—exactly why auditing the drop is so valuable. Operational tips (minimize noise, maximize signal) These are practical recommendations commonly used in incident investigations (treat them as operational guidance you can adapt to your environment): Run the session for a limited time window (enough to catch the drop). Keep actions focused (your list is already purpose-built for attribution). Prefer file target if you need durable evidence (especially when coordinating across teams). Document what “good” looks like before you enable the session (e.g., confirm whether the history table exists and whether it’s being written to), using the troubleshooting query from Microsoft docs. Common troubleshooting steps (when the audit session “doesn’t capture anything”) When you set up a targeted Extended Events session like CTXEvent, there are usually three places where things can go wrong: (1) Change Tracking state, (2) XEvent session scope/permissions, or (3) the target (storage/credential). The checklist below is written to help you isolate which layer is failing—quickly, and with minimal guesswork. 1) First, validate the Change Tracking baseline (so you’re not “tracing a ghost”) Does the history table exist—and is it expected to exist? Microsoft’s Change Tracking guidance uses dbo.MSChange_tracking_history as the first place to look for cleanup status and errors. If the table doesn’t exist, that can be a signal that auto-cleanup has never run (or hasn’t run since CT was re-enabled). Check whether auto-cleanup is enabled and whether it’s been running recently. The Change Tracking troubleshooting doc recommends checking sys.change_tracking_databases (to validate auto-cleanup is enabled) and querying the history table for recent entries (to validate cleanup is actually running). If dbo.MSChange_tracking_history is missing, confirm whether that’s due to CT lifecycle. A common “gotcha” is assuming the history table is created the moment CT is enabled. In practice, the table can be missing if CT was disabled and then re-enabled, or if only manual cleanup was used—because the history table is tied to auto-cleanup behavior. Why this matters: If the history table never existed (or is missing for expected reasons), your “history table drop” symptom might actually be a different behavior than a true DDL DROP—and you’ll want to confirm the CT lifecycle first before you rely on drop-auditing as the primary signal. 2) Confirm the Extended Events session is created in the right place (Azure SQL DB specifics) Make sure the session is database-scoped (Azure SQL Database requirement). In Azure SQL Database, Extended Events sessions are always database-scoped, and the event_file target writes to Azure Storage blobs (not local disk). If you accidentally create or manage the session as if it were server-scoped, you’ll hit confusing errors or see “no data.” Confirm you created the session in a user database (not master). Azure SQL Database doesn’t support creating database-scoped sessions from master. If you try, you can run into failures that look like target/session problems but are actually scope issues. Verify permissions: least privilege that still works. If the session creation/alter/start fails silently (or you can’t start it), confirm the principal has the right database permission. A practical minimum that works for managing DB-scoped sessions is ALTER ANY DATABASE EVENT SESSION (instead of granting broad CONTROL/db_owner). 3) If you’re using event_file: storage + credential is the most common failure point If the session starts but no .xel ever shows up, focus here first—because event_file in Azure SQL DB depends on a correct Storage setup. Validate the “happy path” prerequisites (in order): Storage account exists and a container exists. The Database Engine has access to that container either via RBAC (managed identity) or via a SAS token. A credential is created in the same database where the session is created. Microsoft’s event_file walkthrough is explicit about these steps, and missing any one of them can cause “target initialization” or “no output” symptoms. Quick diagnostic: temporarily switch to an in-memory target to prove events are firing. The Azure SQL Extended Events guidance notes you can use a ring_buffer target as a quick way to validate that the session is collecting events (without involving storage). If ring_buffer shows events but event_file does not, the issue is almost certainly storage/credential access. Common “it looks correct but still fails” causes (practical): You created the credential in the wrong database (or the session is in a different DB than the credential). The container permissions don’t actually include write access for the engine identity/token. The session was created correctly but never started (especially if STARTUP_STATE = OFF). 4) If you do capture events, but can’t attribute “who did it” Once the session is logging events, attribution usually comes down to having the right actions attached. The docs emphasize that Extended Events are designed to collect “as much or as little as necessary,” and in investigations you typically want the “who/what/where” context: app name, hostname, username, session_id, sql_text, etc. Practical tip: Start with your curated set of actions (like client_app_name, client_hostname, username, session_id, and sql_text), and only add more fields if you still can’t triangulate the origin. This keeps overhead and noise down while staying evidence driven. 5) If the table drop is periodic (hourly / scheduled), treat it like an automation clue In real customer scenarios, the symptom can be “history table disappears repeatedly,” which makes it especially important to correlate drops with client_app_name / hostname / username (and optionally transaction/session details) to determine whether this is a scheduled job, maintenance task, or application behavior. This is exactly the type of “pattern identification” where XEvents tends to outperform ad-hoc guessing. Quick “decision tree” summary History table missing? Start with CT lifecycle + auto-cleanup checks. Session exists but captures nothing? Confirm DB scope + permissions. Captures to ring_buffer but not to event_file? Storage/credential access problem. Captures events but no “who”? Add/verify the right actions (client_app_name/hostname/username/sql_text). References Extended Events QuickStart and overview (applies to Azure SQL Database). Change Tracking cleanup troubleshooting and history table usage (dbo.MSChange_tracking_history). Event File target for Azure SQL Database stored in Azure Storage + steps (RBAC/SAS + credential). Reading XEL output from blob using sys.fn_xe_file_target_read_file.
Mohamed_Baioumy_MSFT
Jan 27, 2026 Place Azure Database Support Blog
152Views
0likes
0Comments
Wireshark walkthrough: observing an Azure SQL Database TDS connection on the wire
One of the most effective ways to truly understand how Azure SQL Database connectivity works is to observe the connection at the packet level. A short Wireshark capture clearly shows the interaction between TCP, TLS, TDS, and the Azure SQL gateway / redirect flow. This section walks through what you should expect to see in a healthy Azure SQL Database connection. 1. Capture setup (high level) Capture traffic on the client side, while reproducing the connection attempt (for example, using sqlcmd, SSMS, or your application). Typical display filters to start with: tcp.port == 1433 If Redirect mode is involved, you may later see additional TCP ports (for example, 11xxx–11999 or a specific dynamic port like 6188). 2. TCP three‑way handshake (transport layer) The first packets are standard TCP: SYN → client initiates connection SYN‑ACK → Azure SQL gateway responds ACK → connection established At this point: The destination IP is the Azure SQL Database gateway The destination port is 1433 This confirms basic network reachability to the gateway. 3. TDS PRELOGIN message (application layer) Immediately after the TCP handshake, Wireshark shows the TDS Pre‑Login packet. Key observations: Protocol: TDS Message type: PRELOGIN This packet contains no credentials It negotiates: TDS version Encryption capability MARS (Multiple Active Result Sets) Federated authentication flags (if applicable) In Wireshark, this often appears as: Tabular Data Stream Packet Type: Pre-Login This is the first SQL‑aware packet on the wire. 4. TLS handshake (security layer) What happens next depends on the TDS version and encryption mode: TDS 7.x (older behavior) PRELOGIN is visible in clear text TLS handshake occurs after PRELOGIN TDS 8.0 / Encrypt=Strict (modern Azure SQL behavior) TLS handshake occurs immediately after TCP PRELOGIN itself is encrypted Wireshark shows: Client Hello Server Hello Certificate exchange In this case: You will not see readable TDS fields unless TLS decryption is configured This is expected and by design 5. LOGIN and authentication phase After TLS is established: Client sends LOGIN7 message Authentication occurs (SQL auth, Azure AD token, managed identity, etc.) All sensitive data is encrypted In Wireshark: Packets are marked as TLS Application Data TDS payload is no longer human‑readable A successful login is followed by: ENVCHANGE tokens Session setup responses 6. Redirect behavior (very important in Azure SQL) If the server is using Redirect connection policy, Wireshark shows a distinctive pattern: Initial connection: TCP → 1433 (gateway) Server responds with routing information Client opens a second TCP connection: New destination IP New dynamic port (for example: 6188) In Wireshark you will see: A new TCP handshake Same SQL hostname, different port The original 1433 connection may close or go idle This is the most common source of confusion: Firewalls allow 1433 ✅ Dynamic redirect ports ❌ blocked Result: login timeout or pre‑login handshake failure 7. Query execution and result sets Once connected to the final database endpoint: Client sends SQL batches or RPC calls Server responds with result sets Traffic continues over the redirected connection In Wireshark: All packets appear as encrypted TLS application data Packet sizes correlate with query result size Long‑running queries show sustained data flow 8. What Wireshark helps you diagnose quickly Using this walkthrough, Wireshark can immediately answer: ✅ Is the client reaching the Azure SQL gateway? ✅ Is TLS handshake completing? ✅ Is Redirect mode in effect? ✅ Is a second TCP connection attempted? ✅ Which port is failing? ✅ Is the failure network‑level or authentication‑level? This makes it invaluable for troubleshooting: Login timeouts Pre‑login handshake errors Redirect / firewall issues TLS / driver encryption mismatches Key takeaway When troubleshooting Azure SQL connectivity: Always expect more than one TCP connection 1433 is only the entry point Redirect mode + TDS + TLS explains 90% of “mysterious” connection failures Once you see the flow in Wireshark, Azure SQL connectivity stops being a black box.
Mohamed_Baioumy_MSFT
Jan 26, 2026 Place Azure Database Support Blog
224Views
0likes
0Comments
Using ClientConnectionId to Correlate .NET Connection Attempts in Azure SQL
Getting Better Diagnostics with ClientConnectionId in .NET A few days ago, I was working on a customer case involving intermittent connectivity failures to Azure SQL Database from a .NET application. On the surface, nothing looked unusual. Retries were happening. In this post, I want to share a simple yet effective pattern for producing JDBC-style trace logs in .NET — specifically focusing on the ClientConnectionId property exposed by SqlConnection. This gives you a powerful correlation key that aligns with backend diagnostics and significantly speeds up root cause analysis for connection problems. Why ClientConnectionId Matters Azure SQL Database assigns a unique identifier to every connection attempt from the client. In .NET, this identifier is available through the ClientConnectionId property of SqlConnection. According to the official documentation: The ClientConnectionId property gets the connection ID of the most recent connection attempt, regardless of whether the attempt succeeded or failed. Source: https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.clientconnectionid?view=netframework-4.8.1 This GUID is the single most useful piece of telemetry for correlating client connection attempts with server logs and support traces. What .NET Logging Doesn’t Give You by Default Unlike the JDBC driver, the .NET SQL Client does not produce rich internal logs of every connection handshake or retry. There’s no built-in switch to emit gateway and redirect details, attempt counts, or port information. What you do have is: Timestamps Connection attempt boundaries ClientConnectionId values Outcome (success or failure) If you capture and format these consistently, you end up with logs that are as actionable as the JDBC trace output — and importantly, easy to correlate with backend diagnostics and Azure support tooling. Below is a small console application in C# that produces structured logs in the same timestamped, [FINE] format you might see from a JDBC trace — but for .NET applications: using System; using Microsoft.Data.SqlClient; class Program { static int Main() { // SAMPLE connection string (SQL Authentication) // Replace this with your own connection string. // This is provided only for demonstration purposes. string connectionString = "Server=tcp:<servername>.database.windows.net,1433;" + "Database=<database_name>;" + "User ID=<sql_username>;" + "Password=<sql_password>;" + "Encrypt=True;" + "TrustServerCertificate=False;" + "Connection Timeout=30;"; int connectionId = 1; // Log connection creation Log($"ConnectionID:{connectionId} created by (SqlConnection)"); using SqlConnection connection = new SqlConnection(connectionString); try { // Log connection attempt Log($"ConnectionID:{connectionId} This attempt No: 0"); // Open the connection connection.Open(); // Log ClientConnectionId after the connection attempt Log($"ConnectionID:{connectionId} ClientConnectionId: {connection.ClientConnectionId}"); // Execute a simple test query using SqlCommand cmd = new SqlCommand("SELECT 1", connection) { Log($"SqlCommand:1 created by (ConnectionID:{connectionId})"); Log("SqlCommand:1 Executing (not server cursor) SELECT 1"); cmd.ExecuteScalar(); Log("SqlDataReader:1 created by (SqlCommand:1)"); } } catch (SqlException ex) { // ClientConnectionId is available even on failure Log($"ConnectionID:{connectionId} ClientConnectionId: {connection.ClientConnectionId} (failure)"); Log($"SqlException Number: {ex.Number}"); Log($"Message: {ex.Message}"); return 1; } return 0; } // Simple logger to match JDBC-style output format static void Log(string message) { Console.WriteLine( $"[{DateTime.Now:yyyy-MM-dd HH:mm:ss}] [FINE] {message}" ); } } Run the above application and you’ll get output like: [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt server name: aabeaXXX.trXXXX.northeurope1-a.worker.database.windows.net port: 11002 InstanceName: null useParallel: false [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt endtime: 1767152309272 [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt No: 1 [2025-12-31 03:38:10] [FINE] ConnectionID:1 Connecting with server: aabeaXXX.trXXXX.northeurope1-a.worker.database.windows.net port: 11002 Timeout Full: 20 [2025-12-31 03:38:10] [FINE] ConnectionID:1 ClientConnectionID: 6387718b-150d-482a-9731-02d06383d38f Server returned major version: 12 [2025-12-31 03:38:10] [FINE] SqlCommand:1 created by (ConnectionID:1 ClientConnectionID: 6387718b-150d-482a-9731-02d06383d38f) [2025-12-31 03:38:10] [FINE] SqlCommand:1 Executing (not server cursor) select 1 [2025-12-31 03:38:10] [FINE] SqlDataReader:1 created by (SqlCommand:1) [2025-12-31 03:38:10] [FINE] ConnectionID:2 created by (SqlConnection) [2025-12-31 03:38:11] [FINE] ConnectionID:2 ClientConnectionID: 5fdd311e-a219-45bc-a4f6-7ee1cc2f96bf Server returned major version: 12 [2025-12-31 03:38:11] [FINE] sp_executesql SQL: SELECT 1 AS ID, calling sp_executesql [2025-12-31 03:38:12] [FINE] SqlDataReader:3 created by (sp_executesql SQL: SELECT 1 AS ID) Notice how each line is tagged with: A consistent local timestamp (yyyy-MM-dd HH:mm:ss) A [FINE] log level A structured identifier that mirrors what you’d see in JDBC logging If a connection fails, you’ll still get the ClientConnectionId logged, which is exactly what Azure SQL support teams will ask for when troubleshooting connectivity issues.
Kalyan_Singh_Bondili
Jan 20, 2026 Place Azure Database Support Blog
347Views
3likes
0Comments
Understanding USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') in Azure SQL DB
Query performance tuning is one of the most critical aspects of maintaining stable and predictable workloads in Azure SQL Database. While Microsoft continuously improves the SQL Server Cardinality Estimator (CE) to provide more accurate row estimates and better plan choices, there are scenarios where workloads may benefit from reverting to the Legacy CE—especially when upgrading databases or encountering regressions after plan compilation. One practical and targeted way to influence the optimizer’s behavior is the query‑level hint: OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); In this blog, we explore what this hint does, when to use it, and how it impacts execution plans in Azure SQL Database. 1. What Is the Cardinality Estimator? The Cardinality Estimator predicts the number of rows (cardinality) processed at each step of an execution plan. These estimates heavily influence the optimizer’s choices around: Join algorithms (Nested Loop, Hash, Merge) Index selection Memory grants Parallelism decisions Operator placement and costing The New CE (introduced in SQL Server 2014+) typically handles modern workloads better through richer correlation assumptions and updated statistical heuristics. However, the Legacy CE can outperform the New CE for certain workloads—particularly those with: Highly skewed data distributions Complex predicates Uncorrelated filter predicates Legacy application patterns relying on outdated parameter sniffing assumptions 2. What Does FORCE_LEGACY_CARDINALITY_ESTIMATION Do? The hint forces the optimizer to compile the query using the Legacy CE, regardless of: Database compatibility level Global database settings (ALTER DATABASE SCOPED CONFIGURATION) Query Store settings (unless a forced plan exists) It provides a granular, non-intrusive, and risk‑controlled way to compare the two estimators side-by-side. Where the hint helps Queries suffering regressions after upgrading compatibility levels Workloads where the New CE underestimates/overestimates row counts Parameter‑sensitive queries sensitive to distribution skew Scenarios where Query Store shows the Legacy CE plan as consistently faster 3. How It Impacts Execution Plans Applying USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') affects the optimizer in several ways: A. Row Estimation Logic Legacy CE assumes more independence between predicates. New CE introduces more correlation assumptions. Effect: Legacy CE may estimate higher cardinality for multi‑predicate filters Higher estimates → optimizer may choose Hash Join or Merge Join over Nested Loops Lower estimates → may reduce memory grants or favor Nested Loops B. Join Strategy Changes Example difference: Legacy CE: Hash Join due to higher row estimates New CE: Nested Loop due to underestimated rows This can dramatically alter performance depending on input sizes. C. Memory Grants Legacy CE often yields larger memory grants due to conservative estimates. This can be positive or negative: Positive: Avoids spills to tempdb Negative: Excessive memory grants can lead to concurrency pressure D. Parallelism Decisions Legacy CE may push optimizer toward parallel plans when it predicts larger input sizes. E. Index and Predicate Choices Legacy CE may select different indexes because estimated filter selectivity differs. This can impact: Index seek vs. index scan decisions Predicate evaluation order Operator costing 4. Real‑World Example Here’s a simplified example comparing the same query compiled via: Default CE Legacy CE via hint SELECT c.CustomerId, o.OrderId, o.TotalAmount FROM dbo.Customers c JOIN dbo.Orders o ON o.CustomerId = c.CustomerId WHERE c.Country = 'US' AND o.OrderDate > DATEADD(day, -30, GETUTCDATE()) OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); Observations typically seen: Default CE → Underestimates rows on Country = 'US' due to skew → Nested Loop Legacy CE → Higher estimate → Hash Join performing better for large inputs 5. When Should You Use This Hint? Recommended scenarios You observe a regression after increasing database compatibility level Query Store indicates the Legacy CE plans consistently outperform Estimation errors lead to tempdb spills or poor join choices You want to validate behavior before applying database‑wide changes Avoid using it when You haven’t validated the effect using Query Store or actual runtime statistics Only small regressions occur and can be solved by updating statistics The root cause is a missing index, outdated statistics, or a data distribution issue 6. Alternatives to the Query-Level Hint Before locking the query to Legacy CE, consider the broader options: A. Query Store Plan Forcing If a Legacy CE plan performs better, Query Store allows you to force it without hints. B. Database Scoped Configuration You can change the database CE behavior globally: ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON; But be careful—this affects all queries. C. Compatibility Level Change Some workloads stabilize when pinned to a previous compatibility level, but this is not preferred long term. D. Fixing Statistics and Indexing Always validate: Up‑to‑date statistics Correct indexing strategy No parameter‑sniffing issues 7. Best Practices for Using FORCE_LEGACY_CARDINALITY_ESTIMATION Test first in lower environments Confirm impact using Query Store (sys.query_store_plan, sys.query_store_runtime_stats) Compare CPU, reads, duration, spills, memory grants Monitor for regressions after index changes or statistics updates Document all queries using hints for long‑term maintainability 8. Conclusion USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') is a powerful and precise tool for controlling query behavior in Azure SQL Database. Although the New CE is generally superior, certain workloads still perform better under the Legacy CE due to its more conservative and less correlated estimation strategies. This hint allows you to: Mitigate regressions Improve plan stability Avoid global configuration changes Make evidence-backed decisions using Query Store
Mohamed_Baioumy_MSFT
Jan 16, 2026 Place Azure Database Support Blog
185Views
0likes
0Comments
Geo‑Replication Redo Lag in Azure SQL Database
Monitoring and Troubleshooting Using Public DMVs Azure SQL Database provides built‑in high availability and geo‑replication capabilities to ensure database resilience and business continuity. While replication is fully managed by the platform, customers may occasionally observe a delay between the primary and geo‑replicated secondary database, especially during periods of heavy write activity. This article provides a public, supported approach to monitoring and understanding geo‑replication delay caused by redo lag, using official Dynamic Management Views (DMVs) and T‑SQL only, without relying on internal tools. Scenario Overview Customers may report that changes committed on the primary database are not immediately visible on the geo‑replicated secondary, sometimes for several minutes. Typical symptoms include: Reporting queries on the geo‑secondary showing stale data Increased redo catch‑up time on the secondary No performance degradation observed on the primary Replication eventually catches up without manual intervention This behavior is commonly associated with redo lag, where the secondary has already received the log records but requires additional time to replay them into data pages. Understanding Geo‑Replication Redo Lag In Azure SQL Database geo‑replication: Transactions are sent from the primary to the secondary Log records are hardened on the secondary The secondary applies these records asynchronously to its data files (redo) If the secondary experiences temporary pressure (for example, due to schema changes or intensive operations), redo may fall behind, causing the secondary to lag—even though data durability remains intact. Step 1: Check Geo‑Replication Status and Lag The primary DMV for monitoring geo‑replication is: 📘 sys.dm_geo_replication_link_status Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database Sample T‑SQL Query -- Run on primary DB SELECT link_guid , partner_server , last_replication , replication_lag_sec FROM sys.dm_geo_replication_link_status; Key Columns Explained Column Description replication_state_desc Current replication state replication_lag_sec Estimated lag (in seconds) last_replication Last successful replication timestamp (UTC) partner_server Geo‑replica logical server Interpretation Healthy replication: replication_lag_sec = 0 and state is healthy Transient delay: Lag increases temporarily but later returns to zero Sustained delay: Lag remains elevated for an extended period and should be investigated further Step 2: Monitor Local Replica Redo Health To understand whether lag is related to redo activity on the secondary, you can query: 📘 sys.dm_database_replica_states Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database Sample T‑SQL Query -- Run on primary DB SELECT database_id, synchronization_state_desc, synchronization_health_desc, redo_queue_size, redo_rate, last_commit_time from sys.dm_database_replica_states Key Insights redo_queue_size Indicates how much log data is pending replay on the secondary redo_rate Shows how quickly redo is being applied last_commit_time Helps estimate data freshness on the secondary (UTC) Interpretation Scenario Observation Normal operation redo_queue_size = 0 Write burst redo_queue_size increases temporarily Recovery in progress redo_rate remains positive Healthy state synchronization_health_desc = HEALTHY Short‑lived spikes are expected platform behavior and usually resolve automatically. Practical Monitoring Considerations Reporting Workloads If applications read from the geo‑secondary for reporting: Expect near‑real‑time, not guaranteed real‑time visibility Design reports to tolerate small delays Route latency‑sensitive reads to the primary if required Transaction Patterns Redo lag is more likely during: Large batch updates Index maintenance operations Schema modification commands Bursty write workloads Keeping transactions short and efficient reduces replication pressure. Best Practices Use UTC timestamps consistently when correlating events Monitor replication_lag_sec and redo_queue_size together Implement retry logic in applications for transient conditions Avoid assuming read replicas are always perfectly synchronized Do not take manual actions during short‑lived redo spikes unless the lag persists Summary Redo lag in Azure SQL Database geo‑replication is a normal and self‑healing behavior during certain workload patterns. By using supported public DMVs and T‑SQL, customers can: Monitor replication health safely Understand replica freshness Make informed application routing decisions Avoid unnecessary intervention Azure SQL Database automatically stabilizes replication once redo pressure subsides, ensuring durability and availability without manual management. References Azure SQL Database High Availability https://learn.microsoft.com/azure/azure-sql/database/high-availability-overview sys.dm_geo_replication_link_status https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database sys.dm_database_replica_states https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database
Mohamed_Baioumy_MSFT
Jan 15, 2026 Place Azure Database Support Blog
407Views
1like
3Comments
Why PITR Restore for Azure SQL Hyperscale Can Take Longer Than Expected
Azure SQL Database Hyperscale is designed to deliver fast, storage‑optimized backup and restore operations. According to Microsoft documentation, most Point‑in‑Time Restore (PITR) operations for Hyperscale should complete within 10 minutes, regardless of database size, because the service uses metadata-based restore techniques rather than copying full data files. However, some real‑world scenarios can lead to unexpectedly long restore times, even when the source database is Hyperscale and even when no obvious configuration changes are made. This article explains one such scenario, outlines what happened, and provides guidance to avoid similar delays in the future. Expected Behavior for Hyperscale PITR Hyperscale databases use a unique architecture that separates compute and storage. Backups are taken from storage snapshots and do not require data copying during a typical PITR restore. From Microsoft Learn: “Most restores complete within minutes, even for large databases.” Ref: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance This performance expectation applies as long as the Backup Storage Redundancy remains the same between the source DB and the target restore. Customer Scenario Overview A customer initiated PITR restore operations for Hyperscale DB: Source DB: Hyperscale (SLO: HS_PRMS_64) Target DB: Hyperscale (SLO: HS_PRMS_128) Same logical server Source DB Backup Storage Redundancy: Standard_RAGRS Customer enabled Zone Redundancy for the target database during restore The customer therefore expected the restore to finish within the normal Hyperscale window (~10 minutes). Instead, the restore took significantly longer. Why the Restore Took Longer Although the source database used Standard_RAGRS, enabling Zone Redundancy at restore time introduced a configuration change that affected the underlying Backup Storage Redundancy (BSR) for the newly restored database. 🔍 Key Point: Changing BSR Creates a Full "Size-of-Data" Restore When the target DB uses a different BSR type than the source DB, Azure SQL Database cannot perform a fast metadata-based restore. Instead, it must perform a full data copy, and the restore becomes proportional to the database size: More data → longer restore Effectively behaves like a physical data movement operation This overrides Hyperscale’s normally fast PITR workflow This behavior is documented here: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance In the customer’s case: Customer-enabled Zone Redundancy changed the restore workflow. As a result, the system selected a backup storage redundancy configuration different than the source: Restore workflow chose: Standard_RAGZRS Source database actually used: Standard_RAGRS (non‑zone‑redundant) This mismatch triggered a size-of-data restore, leading to the observed delay. Summary of Root Cause ✔ Hyperscale PITR is fast only when BSR is unchanged ✔ Customer enabled Zone Redundant configuration during restore ✔ This resulted in a different Backup Storage Redundancy from the source ✔ Target restore had to perform a full data copy, not metadata-based restore ✔ Restore time scaled with database size → leading to long restore duration Key Takeaways 1. Do not change Backup Storage Redundancy during PITR unless necessary Any change (e.g., RAGRS → RAGZRS) converts the restore into a size‑of‑data operation. 2. Restores that involve cross‑region or cross‑redundancy conversions always take longer This applies equally to: PITR restore Restore to another server Restore with SLO changes Restore involving ZRS/RA‑GZRS transitions 3. Hyperscale PITR is extremely fast—when configuration is unchanged If the source and target BSR match, Hyperscale restores usually complete in minutes. 4. Enabling Zone Redundancy is valid, but do it after the restore If the customer wants ZRS for the restored DB: Perform PITR first (fast restore) Then update redundancy post‑restore (online operation) Conclusion While Hyperscale PITR restores are typically very fast, configuration changes during the restore—especially related to Backup Storage Redundancy—can trigger a full data copy and significantly increase restore duration. To get the best performance: Keep the same BSR when performing PITR Apply redundancy changes after the restore completes Use metadata-based restores whenever possible Understanding these nuances helps ensure predictable recovery times and aligns operational processes with Hyperscale’s architectural design.
Mohamed_Baioumy_MSFT
Jan 08, 2026 Place Azure Database Support Blog
274Views
0likes
0Comments