azure sql db
85 TopicsAlternatives After the Deprecation of the Azure SQL Migration Extension in Azure Data Studio
The Azure SQL Migration extension for Azure Data Studio is being deprecated and will be retired by February 28, 2026. As part of our unified and streamlined migration strategy for Azure SQL, we are consolidating all migration experiences into a consistent, scalable platform. If you are currently using the Azure SQL Migration extension, this blog will guide you through recommended replacement options for every phase of migration, whether you are moving to Azure SQL Managed Instance, SQL Server on Azure Virtual Machines, or Azure SQL Database. What is happening to the Azure SQL Migration extension in ADS? As you already know, Azure data studio will officially retire on February 28, 2026. The Azure SQL Migration extension in Azure Data Studio will also retire along with Azure Data Studio on February 28, 2026. The Azure SQL Migration extension will no longer be available in the marketplace of Azure Data Studio. What should you use instead? Below is the updated guidance for the migration tool categorized by migration phase and target. 1) Pre‑Migration: Discovery & Assessments Prior to migration, it is advisable to evaluate the SQL Server environment for readiness and to determine the right-sized Azure SQL SKU. Below are the recommended options: A) SQL Server enabled by Azure Arc Use the SQL Server migration experience in the Azure Arc portal for: Instance discovery at scale Migration assessments at scale, including: Readiness assessment for all Azure SQL targets. Performance-based, right-sized target recommendations. Projected Azure costs with the recommended target configuration. Reference: Steps to get started with the Azure Arc assessments- Deploy Azure Arc on your servers. SQL Server instances on Arc-enabled servers are automatically connected to Azure Arc. See options to optimize this. B) Automated assessments at scale using Azure DMS PowerShell and Azure CLI The Azure DataMigration modules in Azure PowerShell and Azure CLI can be used to automate assessments at scale. Learn more about how to do this. Here are the sample templates to automate the assessment workflow: Azure PowerShell DataMigration cmdlets DMS Azure CLI commands C) Azure Migrate For scenarios where assessments are required at data center level including different types of workloads like Applications, VM Servers and databases, use Azure Migrate to perform discovery and assessments at scale. Learn more about Azure Migrate. References: Review inventory Create SQL Assessment Review SQL Assessment 2) Migrations Based on the migration targets, here are the recommended tools you can use to carry out the migration: A. To Azure SQL Managed Instance The following options are available for migrating data to Azure SQL Managed Instance: 1. SQL Migration experience in Azure Arc For migrations to Azure SQL MI, leverage the streamlined SQL Migration experience in Azure Arc which lets you complete the end-to-end migration journey in a single experience. This experience provides: Evergreen assessments and right-fit Azure SQL target recommendation. Inline Azure SQL Target creation. Free Azure SQL MI Next generation General Purpose service that lets you experience the power of Azure SQL MI for free for 12 months. Near zero downtime migration using Managed Instance link powered by Distributed Availability Group technology. Secure connectivity. Reference blog: SQL Server migration in Azure Arc 2. Automated migration at scale using Azure DMS PowerShell and Azure CLI To Orchestrate migrations to Azure SQL MI at scale programmatically, use: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. B. To SQL Server on Azure Virtual Machines To migrate to SQL Server on Azure Virtual Machines, use: 1. Azure Database Migration Service (DMS) DMS supports migrating to SQL Server on Azure Virtual Machines using both online and offline methods. Your SQL Server backups can be in Azure Blob Storage or on a network SMB file share. For details on each option, see: Backups stored in Azure Blob Storage Backups maintained on network SMB file shares Note: The migration experience from SQL Server on-premises to SQL Server on Azure VM will soon be available in SQL Server enabled by Azure Arc. 2. Automated migration at scale using Azure DMS PowerShell and Azure CLI For programmatic migrations to Azure SQL Virtual Machines: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. 3. SSMS option: SQL Server Management Studio (SSMS) migration component If you can connect to both SQL Server on-premises and SQL Server running on Azure VM using SQL Server Management Studio, the migration component in SSMS can help you to migrate to SQL Server on Azure VM. For details, see SSMS Migration component. C. To Azure SQL Database Migrating a SQL Server database to Azure SQL Database typically involves migrating schema and data separately. Here are the options to perform offline and online migration to Azure SQL Database: 1. Offline migration to Azure SQL Database a. Azure Database Migration Service (DMS) portal experience Use Azure DMS portal to migrate both schema and data. Azure DMS uses Azure Data Factory and leverages the Self-hosted Integration Runtime (SHIR). Installation steps are here. b. Automated migration at scale using Azure DMS PowerShell and Azure CLI Use Azure DMS PowerShell and Azure CLI command line to orchestrate the schema and data migration to Azure SQL Database at scale: DMS PowerShell cmdlets DMS Azure CLI commands Learn more about how to do this. 2. Online migration to Azure SQL Database Using Striim To enable online migration of your mission critical databases to Azure SQL Database leverage Striim. Microsoft and Striim have entered a strategic partnership to enable continuous data replication from off-Azure SQL Servers to Azure SQL Database with near-zero downtime. For more details, refer to: Zero downtime migration from SQL Server to Azure SQL Database | Microsoft Community Hub Removing barriers to migrating databases to Azure with Striim’s Unlimited Database Migration program... To leverage the Striim program for migrations, please reach out to your Microsoft contact or submit the below feedback to get started. Summary The table below provides a summary of the available alternatives for each migration scenario. Migration Scenario Guided experience Automation experience Pre-Migration (Discovery + Assessment) SQL Migration experience in Azure Arc / Azure Migrate DMS PowerShell / Azure CLI To Azure SQL Managed Instance SQL Migration experience in Azure Arc DMS PowerShell / Azure CLI To SQL Server on Azure Virtual Machine DMS Azure Portal / SSMS migration component DMS PowerShell / Azure CLI To Azure SQL Database DMS Azure portal (offline & schema migration) / Striim (online migration) DMS PowerShell / Azure CLI (offline & schema migration) Final Thoughts Simplify your SQL migration journey and improve migration velocity to all Azure SQL targets, leverage the connected migration experiences in SQL Server enabled by Azure Arc, DMS, and SSMS. For SSMS, as a first step we brought the capabilities to perform assessment and migration to higher versions of SQL Server including to SQL Server on Azure Virtual Machines. As a next step, we are bringing cloud migration capabilities as well into SSMS. Feedback We love hearing from our customers. If you have feedback or suggestions for the product group, please use the following form: Feedback form As you begin your migration to Azure, we welcome your feedback. If you do not see suitable alternatives for any migration phases, use the feedback form to let us know so we can update the options accordingly.784Views1like0CommentsMultiple secondaries for Azure SQL Database failover groups (Public Preview)
Azure SQL Database failover groups are a business continuity capability that helps you manage geo-replication and coordinated failover of one or more databases from a primary Azure SQL logical server to a secondary logical server in another region. A key benefit is automatic endpoint redirection: your application can keep using the same listener endpoints, and connections are routed to the current primary after a geo-failover. Until recently, an Azure SQL Database failover group supported only one secondary server. Microsoft has now announced public preview support for up to four secondaries per failover group—unlocking more flexible DR designs, regional read scale-out patterns, and richer HA/BC architectures. Why this matters: the “one-secondary” constraint is gone Traditional failover group topologies were straightforward: one primary ↔ one secondary. That model is great for many DR plans, but it can be limiting when you want to: Distribute read-only workloads across multiple regions, Maintain multiple failover targets for broader regional resilience, Support complex compliance or geo-distribution requirements, or Migrate regions without sacrificing existing protection. With multiple secondaries, you can now design a failover group that better matches how globally distributed applications actually run—while still benefiting from the simplicity of the failover group abstraction and stable listener endpoints. What’s new in Public Preview With this enhancement, you can now: Create up to four secondary servers for a single failover group. Place these secondaries in the same or different Azure regions (with important guidance for read-only routing—covered below). Choose which secondary should receive traffic for the read-only listener endpoint (<fog-name>.secondary.database.windows.net). Fail over to any secondary (planned or forced), giving you more operational flexibility during incidents or DR drills. Supported service tiers for multiple secondaries (per the announcement): Standard, General Purpose, Premium, Business Critical, and Hyperscale. Core concept refresher: listener endpoints (and why they’re central) Failover groups expose two stable DNS-based listener endpoints: Read-write listener: <fog-name>.database.windows.net Read-only listener: <fog-name>.secondary.database.windows.net These endpoints are DNS CNAME records created when the failover group is created. After failover, DNS updates redirect the listener(s) to the new role holder. A few practical details that matter in real operations: The Learn documentation notes the DNS TTL is 30 seconds for primary and secondary listener records, which influences how quickly clients pick up endpoint changes (subject to client-side DNS caching behavior). For read-only workloads, it’s recommended to indicate read intent in the connection string using ApplicationIntent=ReadOnly. By default, failover of the read-only listener is disabled (to avoid impacting primary performance when the secondary is offline), which can be a key consideration for read-only availability expectations. Architecture: one failover group, up to four secondaries Here’s a simple way to visualize the topology: With multiple secondaries, you designate one secondary as the read-only listener endpoint target. All read-only listener traffic routes to that chosen secondary. If a failover group has only one secondary, the read-only endpoint defaults to it. How to get started (Azure portal) Creating the initial failover group is unchanged. To add additional secondaries, the announcement outlines these portal steps: Go to your Azure SQL logical server in the Azure portal. Open Failover groups under Data management. Select your existing failover group. Click Add server to add another secondary server. In the side panel, select: The secondary server to add, and The read-only listener endpoint target (a dropdown that lists existing and newly added secondaries). Click Select, then Save to apply the configuration—seeding begins for databases in the group to the new secondary. Important note from the announcement: if you intend to serve read workloads via the read-only listener, the designated read-only target should not be in the same region as the primary. PowerShell examples (create / modify / fail over) The announcement provides PowerShell examples to create and update a failover group with multiple secondaries: Create a failover group with multiple secondaries New-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "primaryserver" ` -PartnerServerName "secondaryserver1" ` -FailoverGroupName "myfailovergroup" ` -FailoverPolicy "Manual" ` -PartnerServerList @("secondary_uri_1","secondary_uri_2","secondary_uri_3","secondary_uri_4") ` -ReadOnlyEndpointTargetServer "secondary_uri_1" The secondary_uri_n values are Azure resource IDs in the form: /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Sql/servers/<server> Add/replace the secondary server list on an existing failover group Set-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "primaryserver" ` -FailoverGroupName "myfailovergroup" ` -FailoverPolicy "Manual" ` -PartnerServerList @("secondary_uri_1","secondary_uri_2","secondary_uri_3","secondary_uri_4") ` -ReadOnlyEndpointTargetServer "secondary_uri_1" Perform a failover Use Switch-AzSqlDatabaseFailoverGroup (example shown in the announcement): Switch-AzSqlDatabaseFailoverGroup ` -ResourceGroupName "myrg" ` -ServerName "secondaryserver1" ` -FailoverGroupName "myfailovergroup" Planned vs. forced failover (operational clarity) The preview announcement highlights portal actions for: Failover (planned / synchronized) Forced failover (unplanned / potential data loss) The Learn documentation further clarifies that Failover performs full synchronization (no data loss) and requires the primary to be accessible, while Forced failover can result in data loss because replication is asynchronous and doesn’t wait for final propagation. Limitations and notes (preview behavior) From the announcement: Up to four secondaries per failover group. Each secondary must be on a different logical server than the primary. Secondaries can be in the same or different regions. The read-only listener endpoint target should be in a different region from the primary if you want to use the read-only listener for read workloads. Failover group name must be globally unique within .database.windows.net. Chaining (geo-replica of a geo-replica) isn’t supported. Backup storage redundancy and zone redundancy inheritance behaviors vary by tier; the announcement calls out: For non-Hyperscale, zone redundancy isn’t enabled by default on secondaries (enable after creation). For Hyperscale, secondaries inherit HA settings from their respective primaries. Best practices (practical guidance) The announcement and documentation recommend several patterns worth emphasizing: Prefer paired regions when feasible for better performance than unpaired regions. Test regularly using planned failovers (DR drills) so you validate your end-to-end plan. Monitor replication lag to ensure your RPO expectations are being met. Keep failover group scope reasonable (number of databases impacts failover duration; the documentation advises limiting databases per group and using multiple groups if needed). Use the right endpoint for the right workload: Writes → <fog-name>.database.windows.net (read-write listener) Reads → <fog-name>.secondary.database.windows.net (read-only listener) Monitoring example: sys.dm_geo_replication_link_status The DMV sys.dm_geo_replication_link_status returns a row per replication link and includes fields like partner_server, partner_database, last_replication, and replication_lag_sec, which you can use to track replication health. A simple query (run in the user database participating in geo-replication) is: SELECT partner_server, partner_database, last_replication, replication_lag_sec FROM sys.dm_geo_replication_link_status; (These columns and their meanings are documented in the DMV reference.) Closing thoughts Multiple secondaries for Azure SQL Database failover groups is a meaningful step forward for customers running globally distributed applications. It combines the operational simplicity of failover groups—stable listener endpoints and coordinated failover—with added architectural freedom: more DR targets, more read scale-out options, and smoother migration pathways. If you’re already using failover groups today, this preview is a great opportunity to revisit your topology and ask: Where would additional geo-secondaries reduce risk or improve performance for my read-heavy workloads? If you’re designing new, this capability expands what “good” looks like for resilient, regionally distributed data tiers. Learn more (official resources) Multiple secondaries for failover groups is now in public preview Failover groups overview & best practices (Azure SQL Database) Configure a failover group (Azure SQL Database) sys.dm_geo_replication_link_status DMV reference364Views0likes0CommentsEnable auditing to identify Change Tracking history table drops using Extended Events (Azure SQL Db)
When Change Tracking is enabled, the system-maintained history table dbo.MSChange_tracking_history is a key place to look for cleanup progress and errors. For example, Microsoft’s troubleshooting guidance for Change Tracking auto-cleanup explicitly references querying the history table to validate whether cleanup is running and whether there are cleanup errors. In real-world environments, teams sometimes observe that dbo.MSChange_tracking_history “disappears” unexpectedly (for example, during cleanup troubleshooting), which then blocks visibility into cleanup progress and error history. When that happens, the most practical way to answer “who/what dropped the history table?” is to audit the DDL using Extended Events (XEvents)—a lightweight monitoring capability designed for troubleshooting. This post shows how to create a database-scoped XEvent session in Azure SQL Database that captures Change Tracking history table drops and related events, using the script you provided. Why Extended Events for this scenario? Extended Events are built to capture internal engine activities with relatively low overhead, and are supported across SQL Server and Azure SQL. In Azure SQL Database, XEvents are commonly used for investigation and troubleshooting when you need evidence of what happened and who initiated it. In this specific case, we want to capture: A history table drop event (Change Tracking-specific) A syscommittab truncate event (Change Tracking-specific) A generic object deletion event filtered to the history table name And we want to enrich each event with context (app, host, user, SQL text, stack, timestamps, etc.). The Extended Events session (copy/paste) Below is your script to create a database-scoped session named CTXEvent that audits the Change Tracking history table drop and related events: CREATE EVENT SESSION [CTXEvent] ON DATABASE ADD EVENT sqlserver.change_tracking_historytable_drop( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username)), ADD EVENT sqlserver.change_tracking_syscommittab_truncate( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username)), ADD EVENT sqlserver.object_deleted( ACTION(mdmtargetpkg.mdmget_TimeStampUTC,sqlserver.client_app_name,sqlserver.client_hostname,sqlserver.context_info,sqlserver.database_name,sqlserver.external_script_request_id,sqlserver.session_id,sqlserver.sql_text,sqlserver.transaction_id,sqlserver.tsql_stack,sqlserver.username) WHERE ([object_name]=N'MSchange_tracking_history')) WITH (MAX_MEMORY=16384 KB,EVENT_RETENTION_MODE=NO_EVENT_LOSS,MAX_DISPATCH_LATENCY=10 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF) GO Why these three events? sqlserver.change_tracking_historytable_drop: directly captures the Change Tracking “history table dropped” operation (purpose-built for this investigation). sqlserver.change_tracking_syscommittab_truncate: captures maintenance actions on syscommittab, which is central to Change Tracking’s internal state. sqlserver.object_deleted with a predicate on MSchange_tracking_history: provides a generic “safety net” if the drop manifests via generic DDL deletion paths. Why these actions? The actions you selected are the practical “who/what/where” context you usually need to attribute the operation: sqlserver.client_app_name / sqlserver.client_hostname: which app + which machine initiated the DDL sqlserver.username: identity executing the statement sqlserver.sql_text + sqlserver.tsql_stack: the statement and call stack (helpful if the drop occurs via a job, framework, or tooling) sqlserver.session_id / sqlserver.transaction_id: correlate activity across events TRACK_CAUSALITY=ON: enables correlation across related events within the session context. Starting and stopping the session Your session is created with STARTUP_STATE=OFF, meaning it does not start automatically. That’s typically a good practice for targeted investigations: enable it only during the window where you expect the issue. For Azure SQL Database, you can manage the session using SSMS or equivalent T-SQL workflows. Microsoft’s Extended Events QuickStart walks through creating sessions and working with session data, including the general workflow concepts. Tip (practical): Turn it on shortly before your expected “drop window,” reproduce/observe, then turn it off after you capture the event to minimize overhead. Where does the data go? Your script defines the events + actions + session options, but it does not define a target (for example, event_file). That’s OK for short, in-the-moment investigations, but for durable evidence and easy sharing, many teams prefer writing to an event file target. Option: Persist captured events to Azure Storage (recommended for multi-hour investigations) Microsoft provides a dedicated walkthrough for Azure SQL Database to create an event session with an event_file target stored in Azure Storage, including: creating a storage account + container granting access (RBAC or SAS) creating a database credential creating/starting the session viewing the event data If you choose this route, you can then read event files using sys.fn_xe_file_target_read_file (supported in Azure SQL Database, commonly used to consume XEL files from blob storage). How to interpret the results (what to look for) When the table drop happens, the captured event payload should allow you to confidently answer: Who initiated the operation? Look at sqlserver.username plus client_app_name and client_hostname. What executed the drop? Look at sql_text and tsql_stack. Was this tied to cleanup activity or another workflow? If you see change_tracking_syscommittab_truncate around the same time (and correlated via session/transaction/casuality identifiers), that can indicate the drop happened in proximity to Change Tracking maintenance behavior. Was the drop recurring or one-off? If you capture repeated events at a steady cadence, it points to an automated process (job, deployment pipeline, scheduled task, framework) rather than an ad-hoc human action. Why this matters: the history table is part of Change Tracking troubleshooting Microsoft’s Change Tracking cleanup troubleshooting guidance explicitly uses dbo.MSChange_tracking_history as the place to validate whether cleanup is running and whether errors are being logged. So if the history table is being dropped, you effectively lose an important “black box recorder” that you rely on to understand cleanup behavior—exactly why auditing the drop is so valuable. Operational tips (minimize noise, maximize signal) These are practical recommendations commonly used in incident investigations (treat them as operational guidance you can adapt to your environment): Run the session for a limited time window (enough to catch the drop). Keep actions focused (your list is already purpose-built for attribution). Prefer file target if you need durable evidence (especially when coordinating across teams). Document what “good” looks like before you enable the session (e.g., confirm whether the history table exists and whether it’s being written to), using the troubleshooting query from Microsoft docs. Common troubleshooting steps (when the audit session “doesn’t capture anything”) When you set up a targeted Extended Events session like CTXEvent, there are usually three places where things can go wrong: (1) Change Tracking state, (2) XEvent session scope/permissions, or (3) the target (storage/credential). The checklist below is written to help you isolate which layer is failing—quickly, and with minimal guesswork. 1) First, validate the Change Tracking baseline (so you’re not “tracing a ghost”) Does the history table exist—and is it expected to exist? Microsoft’s Change Tracking guidance uses dbo.MSChange_tracking_history as the first place to look for cleanup status and errors. If the table doesn’t exist, that can be a signal that auto-cleanup has never run (or hasn’t run since CT was re-enabled). Check whether auto-cleanup is enabled and whether it’s been running recently. The Change Tracking troubleshooting doc recommends checking sys.change_tracking_databases (to validate auto-cleanup is enabled) and querying the history table for recent entries (to validate cleanup is actually running). If dbo.MSChange_tracking_history is missing, confirm whether that’s due to CT lifecycle. A common “gotcha” is assuming the history table is created the moment CT is enabled. In practice, the table can be missing if CT was disabled and then re-enabled, or if only manual cleanup was used—because the history table is tied to auto-cleanup behavior. Why this matters: If the history table never existed (or is missing for expected reasons), your “history table drop” symptom might actually be a different behavior than a true DDL DROP—and you’ll want to confirm the CT lifecycle first before you rely on drop-auditing as the primary signal. 2) Confirm the Extended Events session is created in the right place (Azure SQL DB specifics) Make sure the session is database-scoped (Azure SQL Database requirement). In Azure SQL Database, Extended Events sessions are always database-scoped, and the event_file target writes to Azure Storage blobs (not local disk). If you accidentally create or manage the session as if it were server-scoped, you’ll hit confusing errors or see “no data.” Confirm you created the session in a user database (not master). Azure SQL Database doesn’t support creating database-scoped sessions from master. If you try, you can run into failures that look like target/session problems but are actually scope issues. Verify permissions: least privilege that still works. If the session creation/alter/start fails silently (or you can’t start it), confirm the principal has the right database permission. A practical minimum that works for managing DB-scoped sessions is ALTER ANY DATABASE EVENT SESSION (instead of granting broad CONTROL/db_owner). 3) If you’re using event_file: storage + credential is the most common failure point If the session starts but no .xel ever shows up, focus here first—because event_file in Azure SQL DB depends on a correct Storage setup. Validate the “happy path” prerequisites (in order): Storage account exists and a container exists. The Database Engine has access to that container either via RBAC (managed identity) or via a SAS token. A credential is created in the same database where the session is created. Microsoft’s event_file walkthrough is explicit about these steps, and missing any one of them can cause “target initialization” or “no output” symptoms. Quick diagnostic: temporarily switch to an in-memory target to prove events are firing. The Azure SQL Extended Events guidance notes you can use a ring_buffer target as a quick way to validate that the session is collecting events (without involving storage). If ring_buffer shows events but event_file does not, the issue is almost certainly storage/credential access. Common “it looks correct but still fails” causes (practical): You created the credential in the wrong database (or the session is in a different DB than the credential). The container permissions don’t actually include write access for the engine identity/token. The session was created correctly but never started (especially if STARTUP_STATE = OFF). 4) If you do capture events, but can’t attribute “who did it” Once the session is logging events, attribution usually comes down to having the right actions attached. The docs emphasize that Extended Events are designed to collect “as much or as little as necessary,” and in investigations you typically want the “who/what/where” context: app name, hostname, username, session_id, sql_text, etc. Practical tip: Start with your curated set of actions (like client_app_name, client_hostname, username, session_id, and sql_text), and only add more fields if you still can’t triangulate the origin. This keeps overhead and noise down while staying evidence driven. 5) If the table drop is periodic (hourly / scheduled), treat it like an automation clue In real customer scenarios, the symptom can be “history table disappears repeatedly,” which makes it especially important to correlate drops with client_app_name / hostname / username (and optionally transaction/session details) to determine whether this is a scheduled job, maintenance task, or application behavior. This is exactly the type of “pattern identification” where XEvents tends to outperform ad-hoc guessing. Quick “decision tree” summary History table missing? Start with CT lifecycle + auto-cleanup checks. Session exists but captures nothing? Confirm DB scope + permissions. Captures to ring_buffer but not to event_file? Storage/credential access problem. Captures events but no “who”? Add/verify the right actions (client_app_name/hostname/username/sql_text). References Extended Events QuickStart and overview (applies to Azure SQL Database). Change Tracking cleanup troubleshooting and history table usage (dbo.MSChange_tracking_history). Event File target for Azure SQL Database stored in Azure Storage + steps (RBAC/SAS + credential). Reading XEL output from blob using sys.fn_xe_file_target_read_file.138Views0likes0CommentsWireshark walkthrough: observing an Azure SQL Database TDS connection on the wire
One of the most effective ways to truly understand how Azure SQL Database connectivity works is to observe the connection at the packet level. A short Wireshark capture clearly shows the interaction between TCP, TLS, TDS, and the Azure SQL gateway / redirect flow. This section walks through what you should expect to see in a healthy Azure SQL Database connection. 1. Capture setup (high level) Capture traffic on the client side, while reproducing the connection attempt (for example, using sqlcmd, SSMS, or your application). Typical display filters to start with: tcp.port == 1433 If Redirect mode is involved, you may later see additional TCP ports (for example, 11xxx–11999 or a specific dynamic port like 6188). 2. TCP three‑way handshake (transport layer) The first packets are standard TCP: SYN → client initiates connection SYN‑ACK → Azure SQL gateway responds ACK → connection established At this point: The destination IP is the Azure SQL Database gateway The destination port is 1433 This confirms basic network reachability to the gateway. 3. TDS PRELOGIN message (application layer) Immediately after the TCP handshake, Wireshark shows the TDS Pre‑Login packet. Key observations: Protocol: TDS Message type: PRELOGIN This packet contains no credentials It negotiates: TDS version Encryption capability MARS (Multiple Active Result Sets) Federated authentication flags (if applicable) In Wireshark, this often appears as: Tabular Data Stream Packet Type: Pre-Login This is the first SQL‑aware packet on the wire. 4. TLS handshake (security layer) What happens next depends on the TDS version and encryption mode: TDS 7.x (older behavior) PRELOGIN is visible in clear text TLS handshake occurs after PRELOGIN TDS 8.0 / Encrypt=Strict (modern Azure SQL behavior) TLS handshake occurs immediately after TCP PRELOGIN itself is encrypted Wireshark shows: Client Hello Server Hello Certificate exchange In this case: You will not see readable TDS fields unless TLS decryption is configured This is expected and by design 5. LOGIN and authentication phase After TLS is established: Client sends LOGIN7 message Authentication occurs (SQL auth, Azure AD token, managed identity, etc.) All sensitive data is encrypted In Wireshark: Packets are marked as TLS Application Data TDS payload is no longer human‑readable A successful login is followed by: ENVCHANGE tokens Session setup responses 6. Redirect behavior (very important in Azure SQL) If the server is using Redirect connection policy, Wireshark shows a distinctive pattern: Initial connection: TCP → 1433 (gateway) Server responds with routing information Client opens a second TCP connection: New destination IP New dynamic port (for example: 6188) In Wireshark you will see: A new TCP handshake Same SQL hostname, different port The original 1433 connection may close or go idle This is the most common source of confusion: Firewalls allow 1433 ✅ Dynamic redirect ports ❌ blocked Result: login timeout or pre‑login handshake failure 7. Query execution and result sets Once connected to the final database endpoint: Client sends SQL batches or RPC calls Server responds with result sets Traffic continues over the redirected connection In Wireshark: All packets appear as encrypted TLS application data Packet sizes correlate with query result size Long‑running queries show sustained data flow 8. What Wireshark helps you diagnose quickly Using this walkthrough, Wireshark can immediately answer: ✅ Is the client reaching the Azure SQL gateway? ✅ Is TLS handshake completing? ✅ Is Redirect mode in effect? ✅ Is a second TCP connection attempted? ✅ Which port is failing? ✅ Is the failure network‑level or authentication‑level? This makes it invaluable for troubleshooting: Login timeouts Pre‑login handshake errors Redirect / firewall issues TLS / driver encryption mismatches Key takeaway When troubleshooting Azure SQL connectivity: Always expect more than one TCP connection 1433 is only the entry point Redirect mode + TDS + TLS explains 90% of “mysterious” connection failures Once you see the flow in Wireshark, Azure SQL connectivity stops being a black box.185Views0likes0CommentsUsing ClientConnectionId to Correlate .NET Connection Attempts in Azure SQL
Getting Better Diagnostics with ClientConnectionId in .NET A few days ago, I was working on a customer case involving intermittent connectivity failures to Azure SQL Database from a .NET application. On the surface, nothing looked unusual. Retries were happening. In this post, I want to share a simple yet effective pattern for producing JDBC-style trace logs in .NET — specifically focusing on the ClientConnectionId property exposed by SqlConnection. This gives you a powerful correlation key that aligns with backend diagnostics and significantly speeds up root cause analysis for connection problems. Why ClientConnectionId Matters Azure SQL Database assigns a unique identifier to every connection attempt from the client. In .NET, this identifier is available through the ClientConnectionId property of SqlConnection. According to the official documentation: The ClientConnectionId property gets the connection ID of the most recent connection attempt, regardless of whether the attempt succeeded or failed. Source: https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.clientconnectionid?view=netframework-4.8.1 This GUID is the single most useful piece of telemetry for correlating client connection attempts with server logs and support traces. What .NET Logging Doesn’t Give You by Default Unlike the JDBC driver, the .NET SQL Client does not produce rich internal logs of every connection handshake or retry. There’s no built-in switch to emit gateway and redirect details, attempt counts, or port information. What you do have is: Timestamps Connection attempt boundaries ClientConnectionId values Outcome (success or failure) If you capture and format these consistently, you end up with logs that are as actionable as the JDBC trace output — and importantly, easy to correlate with backend diagnostics and Azure support tooling. Below is a small console application in C# that produces structured logs in the same timestamped, [FINE] format you might see from a JDBC trace — but for .NET applications: using System; using Microsoft.Data.SqlClient; class Program { static int Main() { // SAMPLE connection string (SQL Authentication) // Replace this with your own connection string. // This is provided only for demonstration purposes. string connectionString = "Server=tcp:<servername>.database.windows.net,1433;" + "Database=<database_name>;" + "User ID=<sql_username>;" + "Password=<sql_password>;" + "Encrypt=True;" + "TrustServerCertificate=False;" + "Connection Timeout=30;"; int connectionId = 1; // Log connection creation Log($"ConnectionID:{connectionId} created by (SqlConnection)"); using SqlConnection connection = new SqlConnection(connectionString); try { // Log connection attempt Log($"ConnectionID:{connectionId} This attempt No: 0"); // Open the connection connection.Open(); // Log ClientConnectionId after the connection attempt Log($"ConnectionID:{connectionId} ClientConnectionId: {connection.ClientConnectionId}"); // Execute a simple test query using SqlCommand cmd = new SqlCommand("SELECT 1", connection) { Log($"SqlCommand:1 created by (ConnectionID:{connectionId})"); Log("SqlCommand:1 Executing (not server cursor) SELECT 1"); cmd.ExecuteScalar(); Log("SqlDataReader:1 created by (SqlCommand:1)"); } } catch (SqlException ex) { // ClientConnectionId is available even on failure Log($"ConnectionID:{connectionId} ClientConnectionId: {connection.ClientConnectionId} (failure)"); Log($"SqlException Number: {ex.Number}"); Log($"Message: {ex.Message}"); return 1; } return 0; } // Simple logger to match JDBC-style output format static void Log(string message) { Console.WriteLine( $"[{DateTime.Now:yyyy-MM-dd HH:mm:ss}] [FINE] {message}" ); } } Run the above application and you’ll get output like: [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt server name: aabeaXXX.trXXXX.northeurope1-a.worker.database.windows.net port: 11002 InstanceName: null useParallel: false [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt endtime: 1767152309272 [2025-12-31 03:38:10] [FINE] ConnectionID:1 This attempt No: 1 [2025-12-31 03:38:10] [FINE] ConnectionID:1 Connecting with server: aabeaXXX.trXXXX.northeurope1-a.worker.database.windows.net port: 11002 Timeout Full: 20 [2025-12-31 03:38:10] [FINE] ConnectionID:1 ClientConnectionID: 6387718b-150d-482a-9731-02d06383d38f Server returned major version: 12 [2025-12-31 03:38:10] [FINE] SqlCommand:1 created by (ConnectionID:1 ClientConnectionID: 6387718b-150d-482a-9731-02d06383d38f) [2025-12-31 03:38:10] [FINE] SqlCommand:1 Executing (not server cursor) select 1 [2025-12-31 03:38:10] [FINE] SqlDataReader:1 created by (SqlCommand:1) [2025-12-31 03:38:10] [FINE] ConnectionID:2 created by (SqlConnection) [2025-12-31 03:38:11] [FINE] ConnectionID:2 ClientConnectionID: 5fdd311e-a219-45bc-a4f6-7ee1cc2f96bf Server returned major version: 12 [2025-12-31 03:38:11] [FINE] sp_executesql SQL: SELECT 1 AS ID, calling sp_executesql [2025-12-31 03:38:12] [FINE] SqlDataReader:3 created by (sp_executesql SQL: SELECT 1 AS ID) Notice how each line is tagged with: A consistent local timestamp (yyyy-MM-dd HH:mm:ss) A [FINE] log level A structured identifier that mirrors what you’d see in JDBC logging If a connection fails, you’ll still get the ClientConnectionId logged, which is exactly what Azure SQL support teams will ask for when troubleshooting connectivity issues.306Views2likes0CommentsUnderstanding USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') in Azure SQL DB
Query performance tuning is one of the most critical aspects of maintaining stable and predictable workloads in Azure SQL Database. While Microsoft continuously improves the SQL Server Cardinality Estimator (CE) to provide more accurate row estimates and better plan choices, there are scenarios where workloads may benefit from reverting to the Legacy CE—especially when upgrading databases or encountering regressions after plan compilation. One practical and targeted way to influence the optimizer’s behavior is the query‑level hint: OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); In this blog, we explore what this hint does, when to use it, and how it impacts execution plans in Azure SQL Database. 1. What Is the Cardinality Estimator? The Cardinality Estimator predicts the number of rows (cardinality) processed at each step of an execution plan. These estimates heavily influence the optimizer’s choices around: Join algorithms (Nested Loop, Hash, Merge) Index selection Memory grants Parallelism decisions Operator placement and costing The New CE (introduced in SQL Server 2014+) typically handles modern workloads better through richer correlation assumptions and updated statistical heuristics. However, the Legacy CE can outperform the New CE for certain workloads—particularly those with: Highly skewed data distributions Complex predicates Uncorrelated filter predicates Legacy application patterns relying on outdated parameter sniffing assumptions 2. What Does FORCE_LEGACY_CARDINALITY_ESTIMATION Do? The hint forces the optimizer to compile the query using the Legacy CE, regardless of: Database compatibility level Global database settings (ALTER DATABASE SCOPED CONFIGURATION) Query Store settings (unless a forced plan exists) It provides a granular, non-intrusive, and risk‑controlled way to compare the two estimators side-by-side. Where the hint helps Queries suffering regressions after upgrading compatibility levels Workloads where the New CE underestimates/overestimates row counts Parameter‑sensitive queries sensitive to distribution skew Scenarios where Query Store shows the Legacy CE plan as consistently faster 3. How It Impacts Execution Plans Applying USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') affects the optimizer in several ways: A. Row Estimation Logic Legacy CE assumes more independence between predicates. New CE introduces more correlation assumptions. Effect: Legacy CE may estimate higher cardinality for multi‑predicate filters Higher estimates → optimizer may choose Hash Join or Merge Join over Nested Loops Lower estimates → may reduce memory grants or favor Nested Loops B. Join Strategy Changes Example difference: Legacy CE: Hash Join due to higher row estimates New CE: Nested Loop due to underestimated rows This can dramatically alter performance depending on input sizes. C. Memory Grants Legacy CE often yields larger memory grants due to conservative estimates. This can be positive or negative: Positive: Avoids spills to tempdb Negative: Excessive memory grants can lead to concurrency pressure D. Parallelism Decisions Legacy CE may push optimizer toward parallel plans when it predicts larger input sizes. E. Index and Predicate Choices Legacy CE may select different indexes because estimated filter selectivity differs. This can impact: Index seek vs. index scan decisions Predicate evaluation order Operator costing 4. Real‑World Example Here’s a simplified example comparing the same query compiled via: Default CE Legacy CE via hint SELECT c.CustomerId, o.OrderId, o.TotalAmount FROM dbo.Customers c JOIN dbo.Orders o ON o.CustomerId = c.CustomerId WHERE c.Country = 'US' AND o.OrderDate > DATEADD(day, -30, GETUTCDATE()) OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION')); Observations typically seen: Default CE → Underestimates rows on Country = 'US' due to skew → Nested Loop Legacy CE → Higher estimate → Hash Join performing better for large inputs 5. When Should You Use This Hint? Recommended scenarios You observe a regression after increasing database compatibility level Query Store indicates the Legacy CE plans consistently outperform Estimation errors lead to tempdb spills or poor join choices You want to validate behavior before applying database‑wide changes Avoid using it when You haven’t validated the effect using Query Store or actual runtime statistics Only small regressions occur and can be solved by updating statistics The root cause is a missing index, outdated statistics, or a data distribution issue 6. Alternatives to the Query-Level Hint Before locking the query to Legacy CE, consider the broader options: A. Query Store Plan Forcing If a Legacy CE plan performs better, Query Store allows you to force it without hints. B. Database Scoped Configuration You can change the database CE behavior globally: ALTER DATABASE SCOPED CONFIGURATION SET LEGACY_CARDINALITY_ESTIMATION = ON; But be careful—this affects all queries. C. Compatibility Level Change Some workloads stabilize when pinned to a previous compatibility level, but this is not preferred long term. D. Fixing Statistics and Indexing Always validate: Up‑to‑date statistics Correct indexing strategy No parameter‑sniffing issues 7. Best Practices for Using FORCE_LEGACY_CARDINALITY_ESTIMATION Test first in lower environments Confirm impact using Query Store (sys.query_store_plan, sys.query_store_runtime_stats) Compare CPU, reads, duration, spills, memory grants Monitor for regressions after index changes or statistics updates Document all queries using hints for long‑term maintainability 8. Conclusion USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION') is a powerful and precise tool for controlling query behavior in Azure SQL Database. Although the New CE is generally superior, certain workloads still perform better under the Legacy CE due to its more conservative and less correlated estimation strategies. This hint allows you to: Mitigate regressions Improve plan stability Avoid global configuration changes Make evidence-backed decisions using Query Store155Views0likes0CommentsGeo‑Replication Redo Lag in Azure SQL Database
Monitoring and Troubleshooting Using Public DMVs Azure SQL Database provides built‑in high availability and geo‑replication capabilities to ensure database resilience and business continuity. While replication is fully managed by the platform, customers may occasionally observe a delay between the primary and geo‑replicated secondary database, especially during periods of heavy write activity. This article provides a public, supported approach to monitoring and understanding geo‑replication delay caused by redo lag, using official Dynamic Management Views (DMVs) and T‑SQL only, without relying on internal tools. Scenario Overview Customers may report that changes committed on the primary database are not immediately visible on the geo‑replicated secondary, sometimes for several minutes. Typical symptoms include: Reporting queries on the geo‑secondary showing stale data Increased redo catch‑up time on the secondary No performance degradation observed on the primary Replication eventually catches up without manual intervention This behavior is commonly associated with redo lag, where the secondary has already received the log records but requires additional time to replay them into data pages. Understanding Geo‑Replication Redo Lag In Azure SQL Database geo‑replication: Transactions are sent from the primary to the secondary Log records are hardened on the secondary The secondary applies these records asynchronously to its data files (redo) If the secondary experiences temporary pressure (for example, due to schema changes or intensive operations), redo may fall behind, causing the secondary to lag—even though data durability remains intact. Step 1: Check Geo‑Replication Status and Lag The primary DMV for monitoring geo‑replication is: 📘 sys.dm_geo_replication_link_status Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database Sample T‑SQL Query -- Run on primary DB SELECT link_guid , partner_server , last_replication , replication_lag_sec FROM sys.dm_geo_replication_link_status; Key Columns Explained Column Description replication_state_desc Current replication state replication_lag_sec Estimated lag (in seconds) last_replication Last successful replication timestamp (UTC) partner_server Geo‑replica logical server Interpretation Healthy replication: replication_lag_sec = 0 and state is healthy Transient delay: Lag increases temporarily but later returns to zero Sustained delay: Lag remains elevated for an extended period and should be investigated further Step 2: Monitor Local Replica Redo Health To understand whether lag is related to redo activity on the secondary, you can query: 📘 sys.dm_database_replica_states Public documentation: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-database Sample T‑SQL Query -- Run on primary DB SELECT database_id, synchronization_state_desc, synchronization_health_desc, redo_queue_size, redo_rate, last_commit_time from sys.dm_database_replica_states Key Insights redo_queue_size Indicates how much log data is pending replay on the secondary redo_rate Shows how quickly redo is being applied last_commit_time Helps estimate data freshness on the secondary (UTC) Interpretation Scenario Observation Normal operation redo_queue_size = 0 Write burst redo_queue_size increases temporarily Recovery in progress redo_rate remains positive Healthy state synchronization_health_desc = HEALTHY Short‑lived spikes are expected platform behavior and usually resolve automatically. Practical Monitoring Considerations Reporting Workloads If applications read from the geo‑secondary for reporting: Expect near‑real‑time, not guaranteed real‑time visibility Design reports to tolerate small delays Route latency‑sensitive reads to the primary if required Transaction Patterns Redo lag is more likely during: Large batch updates Index maintenance operations Schema modification commands Bursty write workloads Keeping transactions short and efficient reduces replication pressure. Best Practices Use UTC timestamps consistently when correlating events Monitor replication_lag_sec and redo_queue_size together Implement retry logic in applications for transient conditions Avoid assuming read replicas are always perfectly synchronized Do not take manual actions during short‑lived redo spikes unless the lag persists Summary Redo lag in Azure SQL Database geo‑replication is a normal and self‑healing behavior during certain workload patterns. By using supported public DMVs and T‑SQL, customers can: Monitor replication health safely Understand replica freshness Make informed application routing decisions Avoid unnecessary intervention Azure SQL Database automatically stabilizes replication once redo pressure subsides, ensuring durability and availability without manual management. References Azure SQL Database High Availability https://learn.microsoft.com/azure/azure-sql/database/high-availability-overview sys.dm_geo_replication_link_status https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-geo-replication-link-status-azure-sql-database sys.dm_database_replica_states https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-database-replica-states-azure-sql-databaseWhy PITR Restore for Azure SQL Hyperscale Can Take Longer Than Expected
Azure SQL Database Hyperscale is designed to deliver fast, storage‑optimized backup and restore operations. According to Microsoft documentation, most Point‑in‑Time Restore (PITR) operations for Hyperscale should complete within 10 minutes, regardless of database size, because the service uses metadata-based restore techniques rather than copying full data files. However, some real‑world scenarios can lead to unexpectedly long restore times, even when the source database is Hyperscale and even when no obvious configuration changes are made. This article explains one such scenario, outlines what happened, and provides guidance to avoid similar delays in the future. Expected Behavior for Hyperscale PITR Hyperscale databases use a unique architecture that separates compute and storage. Backups are taken from storage snapshots and do not require data copying during a typical PITR restore. From Microsoft Learn: “Most restores complete within minutes, even for large databases.” Ref: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance This performance expectation applies as long as the Backup Storage Redundancy remains the same between the source DB and the target restore. Customer Scenario Overview A customer initiated PITR restore operations for Hyperscale DB: Source DB: Hyperscale (SLO: HS_PRMS_64) Target DB: Hyperscale (SLO: HS_PRMS_128) Same logical server Source DB Backup Storage Redundancy: Standard_RAGRS Customer enabled Zone Redundancy for the target database during restore The customer therefore expected the restore to finish within the normal Hyperscale window (~10 minutes). Instead, the restore took significantly longer. Why the Restore Took Longer Although the source database used Standard_RAGRS, enabling Zone Redundancy at restore time introduced a configuration change that affected the underlying Backup Storage Redundancy (BSR) for the newly restored database. 🔍 Key Point: Changing BSR Creates a Full "Size-of-Data" Restore When the target DB uses a different BSR type than the source DB, Azure SQL Database cannot perform a fast metadata-based restore. Instead, it must perform a full data copy, and the restore becomes proportional to the database size: More data → longer restore Effectively behaves like a physical data movement operation This overrides Hyperscale’s normally fast PITR workflow This behavior is documented here: https://learn.microsoft.com/azure/azure-sql/database/hyperscale-automated-backups-overview?view=azuresql#backup-and-restore-performance In the customer’s case: Customer-enabled Zone Redundancy changed the restore workflow. As a result, the system selected a backup storage redundancy configuration different than the source: Restore workflow chose: Standard_RAGZRS Source database actually used: Standard_RAGRS (non‑zone‑redundant) This mismatch triggered a size-of-data restore, leading to the observed delay. Summary of Root Cause ✔ Hyperscale PITR is fast only when BSR is unchanged ✔ Customer enabled Zone Redundant configuration during restore ✔ This resulted in a different Backup Storage Redundancy from the source ✔ Target restore had to perform a full data copy, not metadata-based restore ✔ Restore time scaled with database size → leading to long restore duration Key Takeaways 1. Do not change Backup Storage Redundancy during PITR unless necessary Any change (e.g., RAGRS → RAGZRS) converts the restore into a size‑of‑data operation. 2. Restores that involve cross‑region or cross‑redundancy conversions always take longer This applies equally to: PITR restore Restore to another server Restore with SLO changes Restore involving ZRS/RA‑GZRS transitions 3. Hyperscale PITR is extremely fast—when configuration is unchanged If the source and target BSR match, Hyperscale restores usually complete in minutes. 4. Enabling Zone Redundancy is valid, but do it after the restore If the customer wants ZRS for the restored DB: Perform PITR first (fast restore) Then update redundancy post‑restore (online operation) Conclusion While Hyperscale PITR restores are typically very fast, configuration changes during the restore—especially related to Backup Storage Redundancy—can trigger a full data copy and significantly increase restore duration. To get the best performance: Keep the same BSR when performing PITR Apply redundancy changes after the restore completes Use metadata-based restores whenever possible Understanding these nuances helps ensure predictable recovery times and aligns operational processes with Hyperscale’s architectural design.243Views0likes0CommentsWhen and How to Update Statistics in Azure SQL Database
Accurate statistics are a cornerstone of good query performance in Azure SQL Database. While the platform automatically manages statistics in most scenarios, there are real‑world cases where manual intervention is not only recommended—but essential. This article explains when, why, and how to update statistics in Azure SQL Database, with practical samples and real customer scenarios drawn from production support cases. Microsoft Learn reference (overview): https://learn.microsoft.com/sql/relational-databases/statistics/statistics Why Statistics Matter SQL Server’s query optimizer relies on statistics to estimate row counts, choose join strategies, allocate memory grants, and decide whether to run operations in parallel. When statistics are stale or inaccurate, even well‑indexed queries can suddenly degrade. In Azure SQL Database: AUTO_CREATE_STATISTICS is enabled and managed by the platform AUTO_UPDATE_STATISTICS runs asynchronously Because updates are async, queries may continue running with outdated cardinality estimates until statistics refresh completes. https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide When Manual Statistics Updates Are Required 1. After Large Data Changes (ETL / Batch Operations) Customer scenario A nightly ETL job bulk‑inserts millions of rows into a fact table. The following morning, reports time out and logical reads spike. Why it happens Auto‑update thresholds are based on row‑count changes and may not trigger immediately—especially for append‑only or skewed data. Recommended action UPDATE STATISTICS dbo.FactSales; Target only the critical statistic if known: UPDATE STATISTICS dbo.FactSales (IX_FactSales_CreatedDate); 2. Query Plan Regression Without Schema Changes Customer scenario A stable query suddenly switches from a Nested Loops join to a Hash Join, increasing CPU usage and BUFFERIO waits. Root cause Statistics no longer reflect current data distribution. Recommended action UPDATE STATISTICS dbo.Customer WITH FULLSCAN; Learn more: https://learn.microsoft.com/sql/relational-databases/statistics/update-statistics 3. After Restore Operations (PITR / Geo‑Restore / Database Copy) Customer scenario After a Point‑in‑Time Restore (PITR) on a Hyperscale database, queries run slower despite healthy platform telemetry. Why it happens Statistics are restored as‑is, but workload patterns often change after the restore point. Auto‑update statistics may lag behind. Recommended action EXEC sp_updatestats; Prioritize heavily accessed tables first on large databases. Learn more: https://learn.microsoft.com/azure/azure-sql/database/recovery-using-backups Query Store Comparison: Before vs After Updating Statistics One of the most effective ways to validate the impact of statistics updates is Query Store. Before update (typical signs): Sudden plan change for the same query text Increased logical reads and CPU time Change in join strategy or memory grant After statistics update: Optimizer selects a more efficient plan Logical reads reduced CPU and duration stabilize Example workflow -- Capture runtime stats SELECT * FROM sys.query_store_runtime_stats WHERE query_id = <QueryID>; -- Update statistics UPDATE STATISTICS dbo.Orders; -- Force recompilation EXEC sp_recompile 'dbo.Orders'; Query Store reference: https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store Decision Flow: When Should I Update Statistics? Performance regression observed? | v Query plan changed without schema change? | Yes | v Recent data change / restore / ETL? | Yes | v Update targeted statistics If NO at any step, rely on automatic statistics and continue monitoring. What NOT to Do ❌ Do not run blanket WITH FULLSCAN on all tables FULLSCAN is CPU and IO expensive, especially on large or Hyperscale databases. ❌ Do not schedule frequent database‑wide sp_updatestats jobs This can introduce unnecessary workload and plan churn. ❌ Do not update statistics blindly without investigation Always validate plan regression or stale estimates using Query Store or execution plans. Checking Statistics Freshness SELECT OBJECT_NAME(s.object_id) AS table_name, s.name AS stats_name, sp.last_updated, sp.rows, sp.rows_sampled FROM sys.stats s CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp ORDER BY sp.last_updated; DMV reference: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-properties-transact-sql Best Practices Summary ✅ Prefer targeted statistics updates ✅ Update stats after bulk data changes or restores ✅ Validate results using Query Store ✅ Avoid unnecessary FULLSCAN operations ✅ Use stats updates as a diagnostic and remediation step, not routine maintenance Conclusion Although Azure SQL Database manages statistics automatically, asynchronous updates and changing workload patterns can result in sub‑optimal query plans. Manually updating statistics after significant data changes, restore operations, or observed plan regressions is a safe and effective best practice to restore optimal query performance.Azure SQL Database High Availability: Architecture, Design, and Built‑in Resilience
High availability (HA) is a core pillar of Azure SQL Database. Unlike traditional SQL Server deployments—where availability architectures must be designed, implemented, monitored, and maintained manually—Azure SQL Database delivers built‑in high availability by design. By abstracting infrastructure complexity while still offering enterprise‑grade resilience, Azure SQL Database enables customers to achieve strict availability SLAs with minimal operational overhead. In this article, we’ll cover: Azure SQL Database high‑availability design principles How HA is implemented across service tiers: General Purpose Business Critical Hyperscale Failover behavior and recovery mechanisms Architecture illustrations explaining how availability is achieved Supporting Microsoft Learn and documentation references What High Availability Means in Azure SQL Database High availability in Azure SQL Database ensures that: Databases remain accessible during infrastructure failures Hardware, software, and network faults are handled automatically Failover occurs without customer intervention Data durability is maintained using replication, quorum, and consensus models This is possible through the separation of: Compute Storage Control plane orchestration Azure SQL Database continuously monitors health signals across these layers and automatically initiates recovery or failover when required. Azure SQL Database High Availability – Shared Concepts Regardless of service tier, Azure SQL Database relies on common high‑availability principles: Redundant replicas Synchronous and asynchronous replication Automatic failover orchestration Built‑in quorum and consensus logic Transparent reconnect via the Azure SQL Gateway Applications connect through the Azure SQL Gateway, which automatically routes traffic to the current primary replica—shielding clients from underlying failover events. High Availability Architecture – General Purpose Tier The General-Purpose tier uses a compute–storage separation model, relying on Azure Premium Storage for data durability. Key Characteristics Single compute replica Storage replicated three times using Azure Storage Read‑Access Geo‑Redundant Storage (RA‑GRS) optional Stateless compute that can be restarted or moved Fast recovery using storage reattachment Architecture Diagram – General Purpose Tier Description: Clients connect via the Azure SQL Gateway, which routes traffic to the primary compute node. The compute layer is stateless, while Azure Premium Storage provides triple‑replicated durable storage. Failover Behavior Compute failure triggers creation of a new compute node Database files are reattached from storage Typical recovery time: seconds to minutes 📚 Reference: https://learn.microsoft.com/azure/azure-sql/database/service-tier-general-purpose High Availability Architecture – Business Critical Tier The Business-Critical tier is designed for mission‑critical workloads requiring low latency and fast failover. Key Characteristics Multiple replicas (1 primary + up to 3 secondaries) Always On availability group–like architecture Local SSD storage on each replica Synchronous replication Automatic failover within seconds Architecture Diagram – Business Critical Tier Description: The primary replica synchronously replicates data to secondary replicas. Read‑only replicas can offload read workloads. Azure SQL Gateway transparently routes traffic to the active primary replica. Failover Behavior If the primary replica fails, a secondary is promoted automatically No storage reattachment is required Client connections are redirected automatically Typical failover time: seconds 📚 Reference: https://learn.microsoft.com/azure/azure-sql/database/service-tier-business-critical High Availability Architecture – Hyperscale Tier The Hyperscale tier introduces a distributed storage and compute architecture, optimized for very large databases and rapid scaling scenarios. Key Characteristics Decoupled compute and page servers Multiple read replicas Fast scale‑out and fast recovery Durable log service ensures transaction integrity Architecture Diagram – Hyperscale Tier Description: The compute layer processes queries, while durable log services and distributed page servers manage data storage independently, enabling rapid failover and scaling. Failover Behavior Compute failure results in rapid creation of a new compute replica Page servers remain intact Log service ensures zero data loss 📚 Reference: https://learn.microsoft.com/azure/azure-sql/database/service-tier-hyperscale How Azure SQL Database Handles Failures Azure SQL Database continuously monitors critical health signals, including: Heartbeats IO latency Replica health Storage availability Automatic Recovery Actions Restarting failed processes Promoting secondary replicas Recreating compute nodes Redirecting client connections Applications should implement retry logic and transient‑fault handling to fully benefit from these mechanisms. 📚 Reference: https://learn.microsoft.com/azure/architecture/best-practices/transient-faults Zone Redundancy and High Availability Azure SQL Database can be configured with zone redundancy, distributing replicas across Availability Zones in the same region. Benefits Protection against datacenter‑level failures Increased SLA Transparent resilience without application changes 📚 Reference: https://learn.microsoft.com/azure/azure-sql/database/high-availability-sla Summary Azure SQL Database delivers high availability by default, removing the traditional operational burden associated with SQL Server HA designs. Service Tier HA Model Typical Failover General Purpose Storage‑based durability Minutes Business Critical Multi‑replica, synchronous Seconds Hyperscale Distributed compute & storage Seconds By selecting the appropriate service tier and enabling zone redundancy where required, customers can meet even the most demanding availability and resilience requirements with minimal complexity. Additional References Azure SQL Database HA overview https://learn.microsoft.com/azure/azure-sql/database/high-availability-overview Azure SQL Database SLAs https://azure.microsoft.com/support/legal/sla/azure-sql-database Application resiliency guidance https://learn.microsoft.com/azure/architecture/framework/resiliency670Views0likes0Comments