azure sql db
77 TopicsWhy do I see many VDI_CLIENT_WORKER sessions in Azure SQL Database — and do they impact performance?
Sometimes you’ll notice many sessions showing the command VDI_CLIENT_WORKER in Azure SQL Database—often around scaling, replica/copy workflows, or internal seeding operations. These sessions can look alarming, especially during a performance investigation, but they are typically internal background workers. This post explains how to recognize them, what’s safe to do (and what isn’t), and how to focus on the real bottlenecks like blocking/deadlocks or log rate throttling when you’re troubleshooting slowness. Why you might see VDI_CLIENT_WORKER sessions in Azure SQL Database The symptom You run a session query (for example, using sys.dm_exec_requests or a monitoring tool) and observe: Many sessions with command text VDI_CLIENT_WORKER They may appear to be “stuck,” persist longer than expected, and can’t be killed Teams may worry these sessions are “the cause” of slowness Why it shows up in Azure SQL In Azure SQL, VDI_CLIENT_* wait types and VDI_CLIENT_WORKER sessions are commonly associated with platform operations that involve copying/seeding—for example: Scaling operations (service objective changes) Geo-replication / copy workflows Replica seeding-like behaviors Important: The presence of these sessions does not automatically mean they are the bottleneck. How to validate whether VDI_CLIENT_WORKER is benign? 1) Correlate to recent platform operations. Ask: did you recently perform (or did the platform perform) one of these? Scale up/down. Creation of replicas / geo-secondary operations. Any database copy-like workflow. If yes, it’s a strong indicator you’re seeing background workers tied to that lifecycle event. 2) Check whether they consume resources. A practical approach: Look for CPU/IO/log pressure at the database level. Compare the timing of slowness reports with spikes in waits/locks/log write percentage. If these sessions show minimal resource consumption and are just “present,” treat them as background noise while you investigate real contention. 3) Don’t try to kill them! These sessions are typically system/internal. Attempts to kill them may fail or be ineffective—and generally aren’t recommended. 4) If you need them to disappear. In many cases, these internal workers naturally age out. If they remain visible and you need a cleanup path, operational actions like failover/restart may clear stale workers (use change control / maintenance windows as appropriate for your environment). (This is a practical operational observation; always weigh downtime/impact.) When performance is actually slow: focus on what usually hurts. In many real-world incidents, the main causes of slowness are: Blocking chains / deadlocks. Transaction log rate throttling (LOG_RATE_GOVERNOR) during heavy DML. Hot queries running concurrently and contending on the same objects. Key takeaways Seeing many VDI_CLIENT_WORKER sessions is often expected around platform copy/seeding workflows and doesn’t automatically indicate a bottleneck. Don’t attempt to kill system/internal workers; instead, validate resource impact and focus on actual bottlenecks. For real slowness, prioritize diagnosing blocking/deadlocks and LOG_RATE_GOVERNOR-driven DML throttling.73Views0likes0CommentsAzure SQL BACPAC Export Failure with CDC & db_cdcreader (SQL71501)
Overview Exporting an Azure SQL Database to a BACPAC using SqlPackage / SSMS may fail when Change Data Capture (CDC) is enabled and database users (or Entra groups) are assigned to CDC-related roles such as db_cdcreader. A common error observed: Error SQL71501: Error validating element: Role Membership: has an unresolved reference to Role [db_cdcreader]. This issue can be confusing because: The database is healthy CDC is functioning correctly The error occurs only during export Scenario From a real customer case: Azure SQL Database with CDC enabled An Entra (AAD) group added to db_cdcreader Export attempted via SqlPackage (v170+) Export fails during schema validation phase Root Cause Explained 1. SqlPackage performs strict schema modeling During export, SqlPackage (via DacFx) builds a logical schema model of the database. Every object must be fully resolvable Roles and role memberships are validated Any missing/unsupported object → export fails This is why the error appears as: SQL71501 – unresolved reference 2. CDC introduces system-managed objects When CDC is enabled, SQL automatically creates: cdc schema System tables Special roles: db_cdcreader cdc_admin These objects are not treated as regular user-defined objects: They are system-managed Some are implicitly created Some are not fully modeled/exported by DacFx 3. Role membership is the breaking point The failure does not happen because the role exists It happens because: The role membership exists (e.g., Entra group → db_cdcreader) But the role itself is not included or resolved in the export model Result: Membership → cannot resolve target role → validation failure (SQL71501) This behavior aligns with documented patterns where CDC roles are excluded or not recognized during BACPAC export. Reproducing the Issue You are likely impacted if: CDC is enabled Users or Entra groups are assigned to: db_cdcreader cdc_admin Export is attempted via: SqlPackage SSMS “Export Data-tier Application” Workarounds Option 1: Temporarily remove role membership Remove the CDC role membership before export: ALTER ROLE db_cdcreader DROP MEMBER [your_user_or_group]; Run export, then reassign: ALTER ROLE db_cdcreader ADD MEMBER [your_user_or_group]; This is the simplest and most reliable workaround Confirmed in Microsoft Q&A guidance for CDC roles Option 2: Export from a cleaned database copy If you cannot modify production (e.g., tooling restrictions): Create a database copy Remove CDC-related role memberships Export from the copy Recommended when: Using automation tools (e.g., Commvault) Production changes are restricted Option 3: Cleanup unsupported references General best practice: Remove unsupported / system-bound references before export Especially: CDC role memberships Legacy system objects Option 4: Use SqlPackage with ExtractAllTableData=True Another practical workaround is to leverage the SqlPackage option ExtractAllTableData=True, which allows you to extract all data from all user tables. When set to True: Data is extracted from all user tables You cannot specify individual tables When set to False (default): You can selectively extract data from specific tables only This reduces exposure to unsupported or problematic objects during validation Example SqlPackage /Action:Extract /SourceServerName:<server> /SourceDatabaseName:<database> /TargetFile:<output.bacpac> /p:ExtractAllTableData=True When to use this option When the export fails due to CDC roles or related schema validation issues (SQL71501) As a targeted workaround when full export is blocked Important Considerations This is not a runtime database issue It is a schema validation limitation in DacFx / SqlPackage CDC itself is supported, but: Certain security objects are not fully exportable Key Takeaways SQL71501 during export is often a model validation issue, not a data issue CDC roles (db_cdcreader, cdc_admin) can break export due to partial modeling The failure is triggered by role membership, not CDC itself Workarounds involve: Removing memberships Exporting from a cleaned copyMonitoring Azure SQL Data Sync Errors Using PowerShell
Azure SQL Data Sync is a powerful service that enables data synchronization between multiple databases across Azure SQL Database and on‑premises SQL Server environments. It supports hybrid architectures and distributed applications by allowing selected data to synchronize bi‑directionally between hub and member databases using a hub‑and‑spoke topology. However, one of the most common operational challenges faced by support engineers and customers using Azure SQL Data Sync is: ❗ Lack of proactive monitoring for sync failures or errors By default, Azure SQL Data Sync does not provide native alerting mechanisms that notify administrators when synchronization operations fail or encounter issues. This can result in silent data drift or synchronization delays that may go unnoticed in production environments. In this blog, we’ll walk through how to monitor Azure SQL Data Sync activity and detect synchronization errors using Azure PowerShell commands. Why Monitoring Azure SQL Data Sync Matters Azure SQL Data Sync works by synchronizing data between: Hub Database (must be Azure SQL Database) Member Databases (Azure SQL Database or SQL Server) Sync Metadata Database (stores sync configuration and logs) All synchronization activity—including errors, failures, and successes—is logged internally within the Sync Metadata Database and exposed through Azure SQL Sync Group logs. Monitoring these logs enables: Detection of sync failures Identification of schema mismatches Validation of sync completion Troubleshooting of sync group issues Verification of last successful sync activity Prerequisites Before monitoring Azure SQL Data Sync activity, ensure the following: Azure PowerShell module (Az.Sql) is installed You have access to the Azure SQL Data Sync resources Proper authentication and subscription context are configured Install and import the required module if not already available: # Install Azure PowerShell module if not already installed Install-Module -Name Az -Repository PSGallery -Force # Import the SQL module Import-Module Az.Sql Authenticate to Azure: # Login to Azure Connect-AzAccount -TenantId "<tenant-id>" # Set subscription context Set-AzContext -SubscriptionId "<subscription-id>" These commands enable access to Azure SQL Sync Group monitoring operations. Monitoring Sync Group Status To retrieve Sync Group details, define the required variables: # Define variables $resourceGroup = "rg-datasync-demo" $serverName = "<hub-server-name>" $databaseName = "HubDatabase" $syncGroupName = "SampleSyncGroup" # Get sync group details Get-AzSqlSyncGroup -ResourceGroupName $resourceGroup ` -ServerName $serverName ` -DatabaseName $databaseName ` -SyncGroupName $syncGroupName | Format-List Note: The LastSyncTime property returned by Get-AzSqlSyncGroup may sometimes display a value such as 1/1/0001, even when synchronization operations are completing successfully. To obtain accurate synchronization timestamps, it is recommended to use Sync Group Logs instead. Monitoring Sync Activity Using Logs (Recommended) To monitor synchronization activity and retrieve detailed sync status, use: # Get sync logs for the last 24 hours $startTime = (Get-Date).AddHours(-24).ToString("yyyy-MM-ddTHH:mm:ssZ") $endTime = (Get-Date).ToString("yyyy-MM-ddTHH:mm:ssZ") Get-AzSqlSyncGroupLog -ResourceGroupName $resourceGroup ` -ServerName $serverName ` -DatabaseName $databaseName ` -SyncGroupName $syncGroupName ` -StartTime $startTime ` -EndTime $endTime This command retrieves: Sync operation timestamps Sync status Error messages Activity details Sync Group Logs provide more reliable monitoring information than the Sync Group status output alone. Retrieving the Last Successful Sync Time To determine the most recent successful synchronization operation: # Get the most recent successful sync timestamp $startTime = (Get-Date).AddDays(-7).ToString("yyyy-MM-ddTHH:mm:ssZ") $endTime = (Get-Date).ToString("yyyy-MM-ddTHH:mm:ssZ") Get-AzSqlSyncGroupLog -ResourceGroupName $resourceGroup ` -ServerName $serverName ` -DatabaseName $databaseName ` -SyncGroupName $syncGroupName ` -StartTime $startTime ` -EndTime $endTime | Where-Object { $_.Details -like "*completed*" -or $_.Type -eq "Success" } | Select-Object -First 1 Timestamp, Type, Details This helps administrators validate whether synchronization is occurring as expected across the sync topology. Filtering for Synchronization Errors To identify failed or problematic sync operations: # Get only error logs Get-AzSqlSyncGroupLog -ResourceGroupName $resourceGroup ` -ServerName $serverName ` -DatabaseName $databaseName ` -SyncGroupName $syncGroupName ` -StartTime $startTime ` -EndTime $endTime | Where-Object { $_.LogLevel -eq "Error" } Filtering logs by error type allows for: Rapid identification of failed sync attempts Analysis of failure causes Early detection of data consistency risks Key Takeaways Azure SQL Data Sync does not provide native alerting for sync failures Sync Group Logs offer detailed monitoring of sync operations Get-AzSqlSyncGroupLog provides accurate timestamps and status Monitoring logs enables detection of silent sync failures PowerShell can be used to proactively monitor synchronization health References Azure SQL Data Sync Error Monitoring GitHub Repository What is SQL Data Sync for Azure?Connect to Azure SQL Database using a custom domain name with Microsoft Entra ID authentication
Many of us might prefer to connect to Azure SQL Server using a custom domain name (like devsqlserver.mycompany.com) rather than the default fully qualified domain name (devsqlserver.database.windows.net), often because of application-specific or compliance reasons. This article details how you can accomplish this when logging in with Microsoft Entra ID (for example, user@mycompany.com) in Azure SQL Database specific environment. Frequently, users encounter errors similar to the one described below during this process. Before you start: If you use SQL authentication (SQL username/password), the steps are different. Refer the following article for that scenario: How to use different domain name to connect to Azure SQL DB Server | Microsoft Community Hub With SQL authentication, you can include the server name in the login (for example, username@servername). With Microsoft Entra ID authentication, you don’t do that—so your custom DNS name must follow one important rule. Key requirement for Microsoft Entra ID authentication In an Azure SQL Database (PaaS) environment, the platform relies on the server name portion of the Fully Qualified Domain Name (FQDN) to correctly route incoming connection requests to the appropriate logical server. When you use a custom DNS name, it is important that the name starts with the exact Azure SQL server name (the part before .database.windows.net). Why this is required: Azure SQL Database is a multi-tenant PaaS service, where multiple logical servers are hosted behind shared infrastructure. During the connection process (especially with Microsoft Entra ID authentication), Azure SQL uses the server name extracted from the FQDN to: Identify the correct logical server Route the connection internally within the platform Validate the authentication context This behavior aligns with how Azure SQL endpoints are designed and resolved within Microsoft’s managed infrastructure. If your custom DNS name doesn’t start with the Azure SQL server name, Azure can’t route the connection to the correct server. Sign-in may fail and you might see error 40532 (as shown above). To fix this, change the custom DNS name so it starts with your Azure SQL server name. Example: if your server is devsqlserver.database.windows.net, your custom name must start with 'devsqlserver' devsqlserver.mycompany.com devsqlserver.contoso.com devsqlserver.mydomain.com Step-by-step: set up and connect Pick the custom name. It must start with your server name. Example: use devsqlserver.mycompany.com (not othername.mycompany.com). Create DNS records for the custom name. Create a CNAME or DNS alias to point the custom name to your Azure SQL server endpoint (public) or to the private endpoint IP (private) as per the blog mentioned above. Check DNS from your computer. Make sure devsqlserver.mycompany.com resolves to the right address before you try to connect. Connect with Microsoft Entra ID. In SSMS/Azure Data Studio, set Server to your custom server name and select a Microsoft Entra ID authentication option (for example, Universal with MFA). Sign in and connect. Use your Entra ID (for example, user@mycompany.com). Example: Also, when you connect to Azure SQL Database using a custom domain name, you might see the following error: “The target principal name is incorrect” Example: This happens because Azure SQL’s SSL/TLS certificate is issued for the default server name (for example, servername.database.windows.net), not for your custom DNS name. During the secure connection process, the client validates that the server name you are connecting to matches the name in the certificate. Since the custom domain does not match the certificate, this validation fails, resulting in the error. This is expected behavior and is part of standard security checks to prevent connecting to an untrusted or impersonated server. To proceed with the connection, you can configure the client to trust the server certificate by: Setting Trust Server Certificate = True in the client settings, or Adding TrustServerCertificate=True in the connection string This bypasses the strict name validation and allows the connection to succeed. Note: Please use the latest client drivers (ODBC/JDBC/.NET, etc.). In some old driver versions, the 'TrustServerCertificate' setting may not work properly, and you may still face connection issues with the same 'target principal name is incorrect' error. So, it is always better to keep drivers updated for smooth connectivity with Azure SQL. Applies to both public and private endpoints: This naming requirement and approach work whether you connect over the public endpoint or through a private endpoint for Azure SQL Database scenario, as long as DNS resolution for the custom name is set up correctly for your network.377Views3likes0CommentsAzure SQL Hyperscale: Understanding PITR Retention vs Azure Portal Restore UI
Overview Customers using Azure SQL Database – Hyperscale may sometimes notice a discrepancy between the configured Point-in-Time Restore (PITR) retention period and what the Azure Portal displays as available restore points. In some cases: PITR retention is configured (for example, 7 days), Yet the Azure Portal only shows restore points going back a shorter period (for example, 1–2 days), And the restore UI may allow selecting dates earlier than the configured retention window without immediately showing an error. This post explains why this happens, how to validate backup health, and what actions to take. Key Observation From investigation and internal validation, this behavior is not indicative of backup data loss. Instead, it is related to Azure Portal UI behavior, particularly for Hyperscale databases. The backups themselves continue to exist and are managed correctly by the service. Important Distinction: Portal UI vs Actual Backup State What the Azure Portal Shows The restore blade may show fewer restore points than expected. The date picker may allow selecting dates outside the PITR retention window. No immediate validation error may appear in the UI. What Actually Happens Backup retention is enforced at the service layer, not the portal. If a restore is attempted outside the valid PITR window, the operation will fail during execution, even if the UI allows selection. Hyperscale backup metadata is handled differently than General Purpose or Business Critical tiers. Why This Happens with Hyperscale There are a few important technical reasons: Hyperscale backup architecture differs Hyperscale uses a distributed storage and backup model optimized for scale and fast restore, which affects how metadata is surfaced. Some DMVs are not supported Views like sys.dm_database_backups, commonly used for backup visibility, do not support Hyperscale databases. Azure Portal relies on metadata projections The portal restore experience depends on backend projections that may lag or behave differently for Hyperscale, leading to UI inconsistencies. How to Validate Backup Health (Recommended) Instead of relying solely on the Azure Portal UI, use service-backed validation methods. Option 1: PowerShell – Earliest Restore Point You can confirm the earliest available restore point directly from the service: # Set your variables $resourceGroupName = "RG-xxx-xxx-1" $serverName = "sql-xxx-xxx-01" $databaseName = "database_Prod" # Get earliest restore point $db = Get-AzSqlDatabase -ResourceGroupName $resourceGroupName -ServerName $serverName -DatabaseName $databaseName $earliestRestore = $db.EarliestRestoreDate Write-Host "Earliest Restore Point: $earliestRestore" Write-Host "Days Available: $([math]::Round(((Get-Date) - $earliestRestore).TotalDays, 1)) days" This reflects the true PITR boundary enforced by Azure SQL. Option 2: Internal Telemetry / Backup Events (Engineering Validation) Internal monitoring confirms: Continuous backup events are present. Coverage aligns with configured PITR retention. Backup health remains ✅ Healthy even when the portal UI appears inconsistent. Key takeaway: Backup data is intact and retention is honored. Is There Any Risk of Data Loss? No. There is no evidence of backup loss or retention policy violation. This is a visual/UX issue, not a data protection issue. Recommended Actions For Customers ✅ Trust the configured PITR retention, not just the portal display. ✅ Use PowerShell or Azure CLI to validate restore boundaries. ❌ Do not assume backup loss based on portal UI alone. For Support / Engineering Capture a browser network trace when encountering UI inconsistencies. Raise an incident with the Azure Portal team for investigation and fix. Reference Hyperscale-specific behavior during troubleshooting. Summary Topic Status PITR retention enforcement ✅ Correct Backup data integrity ✅ Safe Azure Portal restore UI ⚠️ May be misleading Hyperscale backup visibility ✅ Validate via service tools Final Thoughts Azure SQL Hyperscale continues to provide robust, reliable backup and restore capabilities, even when the Azure Portal UI does not fully reflect the underlying state. When in doubt: Validate via service APIs Rely on enforcement logic, not UI hints Escalate portal inconsistencies appropriately131Views0likes0CommentsWhen Azure Portal/CLI Can’t Delete an Azure SQL DB: Check the Database Name (Unsupported Characters)
Scenario (from a real service request) A customer reported a General Purpose (Gen5, 2 vCores) Azure SQL Database that was incurring charges but could not be deleted using Azure Portal or Azure CLI. CLI output showed two entries, including one whose database name included a forward slash (example display: xxxx-xxx-sql/xxx-xxx-db). Symptoms you may see The database appears in listing outputs, but deletion via ARM/CLI fails with invalid resource ID formatting. The name looks like server/db (contains /), making it difficult for portal/CLI to target correctly. Why this happens? Databases created through T‑SQL/SSMS can sometimes allow characters that ARM-based creation would block, which can cause portal/CLI/ARM operations to fail for that database. In SQL, identifiers that don’t follow “regular” naming rules must be used as delimited identifiers (e.g., wrapped in brackets). The fix that worked We advised the customer to delete the database using T‑SQL, enclosing the database name in square brackets (delimited identifier). The customer confirmed the database was successfully dropped using this approach. If you want to prevent this going forward Prefer creating databases through portal/ARM/CLI, which enforces naming rules and avoids “unsupported character” edge cases. If you must keep a database that has unsupported characters, Microsoft’s public guidance notes that the long-term workaround is to rename the database using T‑SQL to a compliant name so it can be managed normally via portal/CLI again Key takeaway If an Azure SQL Database becomes “undeletable” through portal/CLI and the name contains unusual characters (like '<,>,*,%,&,:,\,/,?'), it may still be fully manageable from T‑SQL using delimited identifiers—and that can be the cleanest way to unblock deletion and stop unexpected costs.How to Get Database‑Wise Session Details in an Azure SQL Elastic Pool Using T‑SQL
When you run multiple databases inside an Azure SQL Database elastic pool, it’s common to hit questions like: Which database is using the most sessions right now? Are we getting close to the pool’s session limit? Which application(s) are opening connections? Is connection pooling configured correctly? The Azure portal can be helpful, but you don’t always have portal access—and even when you do, you may want a quick, scriptable approach you can run from SSMS / Azure Data Studio / sqlcmd. This post provides copy‑paste T‑SQL queries to: check pool‑level session pressure, list active sessions “by database”, summarize active session counts per database, and capture a connection inventory—and then ties it all back to one of the most common root causes of high session counts: connection pooling behavior in the application. What you should know up front (setting expectations) 1) Elastic pool DMVs give you pool context from inside any pooled database The DMV sys.dm_elastic_pool_resource_stats returns usage for the elastic pool that contains the current database, including concurrent session utilization, and it can be queried from any user database in the same elastic pool. 2) Connection/session DMVs can show pool‑wide connections (with sufficient permissions) Microsoft documentation notes you can use sys.dm_exec_connections to retrieve connection details—and if a database is in an elastic pool and you have sufficient permissions, the view returns the set of connections for all databases in the elastic pool. It also calls out sys.dm_exec_sessions as a companion DMV for session details. If you run the queries below and only see your own session, it typically indicates a permissions scope limitation (the documentation notes this behavior for DMV visibility). Quick “Which query should I run?” guide Are we close to pool session limits? → Query A Which database is busy right now (active work)? → Query B Give me a ranked list of active sessions per database → Query C Which apps/hosts/users are connecting? → Query D Query A — Check elastic pool session pressure (near real‑time) Run this in any user database in the elastic pool: SELECT TOP (60) end_time, avg_cpu_percent, avg_data_io_percent, avg_log_write_percent, max_worker_percent, max_session_percent, used_storage_percent FROM sys.dm_elastic_pool_resource_stats ORDER BY end_time DESC; How to interpret it max_session_percent tells you how close your pool is to its session limit (peak session utilization in the interval). This DMV is intended for real‑time monitoring and troubleshooting and retains data for ~40 minutes. Query B — Active sessions by database (best for “what’s happening right now?”) This query focuses on sessions that are currently executing requests, and attributes them to a database by using the request’s SQL context (DB_NAME(st.dbid)). The st.dbid approach is widely used in troubleshooting patterns to show the execution context database. SELECT DB_NAME(st.dbid) AS database_name, s.session_id, s.login_name, s.host_name, s.program_name, s.client_interface_name, c.net_transport, c.encrypt_option, c.auth_scheme, c.connect_time, s.login_time, r.status AS request_status, r.command, r.start_time, r.cpu_time, r.total_elapsed_time FROM sys.dm_exec_requests AS r JOIN sys.dm_exec_sessions AS s ON r.session_id = s.session_id JOIN sys.dm_exec_connections AS c ON c.session_id = s.session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st WHERE s.is_user_process = 1 AND r.session_id <> @@SPID ORDER BY database_name, r.cpu_time DESC; What customers typically use this for Identify which database has the most active work right now See which client program and host are responsible Spot heavy or long‑running requests using cpu_time and total_elapsed_time Query C — Count active sessions per database (simple ranked view) If you want a quick summary like “DB1 has 18 active sessions; DB2 has 5…” WITH active_pool_sessions AS ( SELECT DB_NAME(st.dbid) AS database_name, r.session_id FROM sys.dm_exec_requests AS r JOIN sys.dm_exec_sessions AS s ON r.session_id = s.session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st WHERE s.is_user_process = 1 ) SELECT database_name, COUNT(*) AS active_sessions FROM active_pool_sessions GROUP BY database_name ORDER BY active_sessions DESC; This is a great “top list” during incidents and uses the same execution context mapping pattern (st.dbid). Query D — Connection inventory (who is connected?) Use this when you suspect connection storms, too many open sessions, or connection pooling issues. SELECT c.session_id, c.net_transport, c.encrypt_option, c.auth_scheme, s.host_name, s.program_name, s.client_interface_name, s.login_name, s.original_login_name, c.connect_time, s.login_time FROM sys.dm_exec_connections AS c JOIN sys.dm_exec_sessions AS s ON c.session_id = s.session_id WHERE s.is_user_process = 1 ORDER BY c.connect_time DESC; Microsoft documentation provides this exact join pattern (connections + sessions) as the baseline way to retrieve connection metadata and notes elastic pool behavior when permissions allow. Connection pooling: the #1 reason session counts spike (and how to fix it) Now that you can see sessions and connections, here’s the most common “why”: connection pooling configuration and behavior in the application. What connection pooling is? Creating a new database connection includes several time‑consuming steps: establishing a physical channel, handshake, parsing the connection string, authenticating, and other checks. To reduce that overhead, ADO.NET uses connection pooling: when your application calls Open(), the pooler tries to reuse an existing physical connection; when your application calls Close()/Dispose, the connection is returned to the pool instead of being physically closed—ready for reuse on the next Open(). Important (and often misunderstood): pooling is client‑side, but it has a very real effect on how many concurrent sessions you consume in an elastic pool. Common pooling pitfalls that cause “too many sessions” 1) Connections are not being returned to the pool (connection leaks) Pooling relies on the application calling Close()/Dispose so the pooler can return the connection to the pool for reuse. If connections aren’t closed properly, the pool can’t reuse them, and your app may keep creating new ones. What it looks like in SQL: Query D shows a growing number of sessions from the same program/host over time. Practical fix: Ensure every DB usage pattern disposes the connection (e.g., using blocks in .NET). (General best practice; the pooling mechanism’s reliance on Close/return is documented.) 2) Pool fragmentation (you accidentally create multiple pools) ADO.NET keeps separate pools for different configurations. Connections are separated into pools by connection string and (when integrated security is used) by Windows identity. Pools can also vary based on transaction enlistment and credential instances. What this means: Small differences in connection strings across services/environments can create multiple pools—each with its own connections—so total sessions can be much higher than expected. What it looks like in SQL: Query D shows many sessions from the same overall application family but with slightly different connection contexts (different apps/services). Practical fix: Keep connection strings consistent across instances where possible (same keywords, same security settings, same app identity strategy). (The “separate pools by configuration” concept is documented.) 3) “Max pool size reached” (client-side pool exhaustion) mistaken for Azure SQL limits A Microsoft Tech Community troubleshooting post shows that if you set a small Max Pool Size, you can hit client-side errors such as: “Timeout expired… prior to obtaining a connection from the pool… all pooled connections were in use and max pool size was reached.” That is not the same as hitting an Azure SQL tier limit—it’s the application waiting because it can’t obtain a connection from its own pool. How to differentiate quickly If you see “max pool size reached” / “timeout obtaining connection from the pool” → client pooling pressure. If max_session_percent in Query A is consistently high → pool-level session pressure. 4) Holding connections open longer than necessary Even with pooling enabled, if your application opens a connection and then holds it while doing non-database work, those connections remain “in use” and can’t return to the pool—causing waits and more concurrent sessions under load. (This follows directly from the documented “Open returns a pooled connection; Close returns it to the pool” behavior.) What it looks like in SQL: Query D shows many sessions from the same application. Query B/C shows many active sessions tied to one database during spikes. Practical fix: Open connections as late as possible; close as early as possible around each DB unit of work. (General best practice derived from pooling mechanics.) A simple customer checklist (quick wins) Confirm every DB call closes/disposes the connection so it can return to the pool. Avoid varying connection strings unnecessarily (prevents pool fragmentation). If you see pool wait errors (“max pool size reached”), treat it as an application pooling signal first. Use the T‑SQL queries above to validate: pool pressure (Query A) busiest databases / active sessions (Query B/C) connection sources (Query D) Wrap‑up With the queries in this post, you can troubleshoot elastic pool session behavior without relying on the Azure portal: Query A: real-time pool session/worker pressure Query B/C: database-wise view of active workload (most actionable during incidents) Query D: connection inventory (great for pooling issues and connection storms) And when session counts spike, don’t overlook the application side: connection pooling behavior (leaks, fragmentation, pool sizing, and holding connections open) is one of the most common drivers.120Views0likes0CommentsLessons Learned #539: Azure SQL DB Scale-Down from S3 to S2 Can Fail When Change Feed Is Enabled
Recently, I worked on a service request where a customer reported that an Azure SQL Database could not be scaled down from Standard S3 to Standard S2. The operation failed with the following message: "An unexpected error occurred while processing the request". During the troubleshooting process, we reviewed the database configuration to identify any setting that could prevent the scale-down operation. As part of that review, we executed the query select * from sys.databases and observed that the column is_change_feed_enabled had a value different from 0. This indicated that Change Feed was enabled on the database and, according to the current documentation, this setting is not supported when scaling down to Standard S0, S1, or S2 After disabling Change Feed by running EXEC sys.sp_change_feed_disable_db; we were able to complete the scale-down operation successfully.When and How to Update Statistics in Azure SQL Database
Accurate statistics are a cornerstone of good query performance in Azure SQL Database. While the platform automatically manages statistics in most scenarios, there are real‑world cases where manual intervention is not only recommended—but essential. This article explains when, why, and how to update statistics in Azure SQL Database, with practical samples and real customer scenarios drawn from production support cases. Microsoft Learn reference (overview): https://learn.microsoft.com/sql/relational-databases/statistics/statistics Why Statistics Matter SQL Server’s query optimizer relies on statistics to estimate row counts, choose join strategies, allocate memory grants, and decide whether to run operations in parallel. When statistics are stale or inaccurate, even well‑indexed queries can suddenly degrade. In Azure SQL Database: AUTO_CREATE_STATISTICS is enabled and managed by the platform AUTO_UPDATE_STATISTICS runs asynchronously Because updates are async, queries may continue running with outdated cardinality estimates until statistics refresh completes. https://learn.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide When Manual Statistics Updates Are Required 1. After Large Data Changes (ETL / Batch Operations) Customer scenario A nightly ETL job bulk‑inserts millions of rows into a fact table. The following morning, reports time out and logical reads spike. Why it happens Auto‑update thresholds are based on row‑count changes and may not trigger immediately—especially for append‑only or skewed data. Recommended action UPDATE STATISTICS dbo.FactSales; Target only the critical statistic if known: UPDATE STATISTICS dbo.FactSales (IX_FactSales_CreatedDate); 2. Query Plan Regression Without Schema Changes Customer scenario A stable query suddenly switches from a Nested Loops join to a Hash Join, increasing CPU usage and BUFFERIO waits. Root cause Statistics no longer reflect current data distribution. Recommended action UPDATE STATISTICS dbo.Customer WITH FULLSCAN; Learn more: https://learn.microsoft.com/sql/relational-databases/statistics/update-statistics 3. After Restore Operations (PITR / Geo‑Restore / Database Copy) Customer scenario After a Point‑in‑Time Restore (PITR) on a Hyperscale database, queries run slower despite healthy platform telemetry. Why it happens Statistics are restored as‑is, but workload patterns often change after the restore point. Auto‑update statistics may lag behind. Recommended action EXEC sp_updatestats; Prioritize heavily accessed tables first on large databases. Learn more: https://learn.microsoft.com/azure/azure-sql/database/recovery-using-backups Query Store Comparison: Before vs After Updating Statistics One of the most effective ways to validate the impact of statistics updates is Query Store. Before update (typical signs): Sudden plan change for the same query text Increased logical reads and CPU time Change in join strategy or memory grant After statistics update: Optimizer selects a more efficient plan Logical reads reduced CPU and duration stabilize Example workflow -- Capture runtime stats SELECT * FROM sys.query_store_runtime_stats WHERE query_id = <QueryID>; -- Update statistics UPDATE STATISTICS dbo.Orders; -- Force recompilation EXEC sp_recompile 'dbo.Orders'; Query Store reference: https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store Decision Flow: When Should I Update Statistics? Performance regression observed? | v Query plan changed without schema change? | Yes | v Recent data change / restore / ETL? | Yes | v Update targeted statistics If NO at any step, rely on automatic statistics and continue monitoring. What NOT to Do ❌ Do not run blanket WITH FULLSCAN on all tables FULLSCAN is CPU and IO expensive, especially on large or Hyperscale databases. ❌ Do not schedule frequent database‑wide sp_updatestats jobs This can introduce unnecessary workload and plan churn. ❌ Do not update statistics blindly without investigation Always validate plan regression or stale estimates using Query Store or execution plans. Checking Statistics Freshness SELECT OBJECT_NAME(s.object_id) AS table_name, s.name AS stats_name, sp.last_updated, sp.rows, sp.rows_sampled FROM sys.stats s CROSS APPLY sys.dm_db_stats_properties(s.object_id, s.stats_id) sp ORDER BY sp.last_updated; DMV reference: https://learn.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-stats-properties-transact-sql Best Practices Summary ✅ Prefer targeted statistics updates ✅ Update stats after bulk data changes or restores ✅ Validate results using Query Store ✅ Avoid unnecessary FULLSCAN operations ✅ Use stats updates as a diagnostic and remediation step, not routine maintenance Conclusion Although Azure SQL Database manages statistics automatically, asynchronous updates and changing workload patterns can result in sub‑optimal query plans. Manually updating statistics after significant data changes, restore operations, or observed plan regressions is a safe and effective best practice to restore optimal query performance.Cannot enable Change Data Capture (CDC) on Azure SQL Database: Msg 22830 + Error 40529 (SUSER_SNAME)
Issue While enabling CDC at the database level on Azure SQL Database using: EXEC sys.sp_cdc_enable_db; GO the operation fails. Error The customer observed the following failure when running sys.sp_cdc_enable_db: Msg 22830: Could not update the metadata that indicates the database is enabled for Change Data Capture. Failure occurred when executing drop user cdc Error 40529: "Built-in function 'SUSER_SNAME' in impersonation context is not supported in this version of SQL Server." What we checked (quick validation) Before applying any changes, we confirmed CDC wasn’t partially enabled and no CDC artifacts were created: -- Is CDC enabled for this database? SELECT name, is_cdc_enabled FROM sys.databases WHERE name = DB_NAME(); -- Does CDC schema exist? SELECT name FROM sys.schemas WHERE name = 'cdc'; -- Does CDC user exist? SELECT name FROM sys.database_principals WHERE name = 'cdc'; These checks were used during troubleshooting in the SR. (Also note: Microsoft Learn documents that enabling CDC creates the cdc schema/user and requires exclusive use of that schema/user.) Cause In this case, the failure aligned with a known Azure SQL Database CDC scenario: enabling CDC can fail if there is an active database-level trigger that calls SUSER_SNAME(). To identify active database-level triggers, we used: SELECT name, object_id FROM sys.triggers WHERE parent_class_desc = 'DATABASE' AND is_disabled = 0; Resolution / Workaround The customer resolved the issue by: Identifying the active database-level trigger. Disabling the trigger temporarily. Enabling CDC at the database level and then enabling CDC on the required tables. Re-enabling the trigger after CDC was successfully enabled. Post-resolution verification After enabling CDC, you can validate the state using: -- Confirm CDC enabled at DB level SELECT name, is_cdc_enabled FROM sys.databases WHERE name = DB_NAME(); And for table-level tracking, Microsoft Learn recommends checking the is_tracked_by_cdc column in sys.tables. Notes / Requirements To enable CDC for Azure SQL Database, db_owner is required. Azure SQL Database uses a CDC scheduler (instead of SQL Server Agent jobs) for capture/cleanup.167Views0likes0Comments