Azure SQL DB
69 TopicsLessons Learned #535: BACPAC Import Failures in Azure SQL Database Due to Incompatible Users
We recently worked on a support case where a customer was trying to import a BACPAC file, generated on a different server and subscription, into their Azure SQL Database. The process kept failing with the following errors: "Could not import package. Error SQL72014: Framework Microsoft SqlClient Data Provider: Msg 33159 - Only connections established with Active Directory accounts can create other Active Directory user" At first glance, this looked like a permissions issue, but digging deeper we realized that the error was triggered when the import process tried to create Entra ID (Azure AD) users while the connection was being made with a SQL Login, We checked several things in the BACPAC: The BACPAC contained references to external Active Directory users that were valid in the source environment but not in the target. Both the Azure portal and SQL Server Management Studio (SSMS) failed with the same error. Since BACPAC files include both schema and user objects, incompatible users were being carried over and breaking the import. After thorough investigation, the following resolution path was established: We created a dummy copy of the source database. We removed the external AD/Entra users from that copy. We generated a new BACPAC from this cleaned database. We imported it into the target Azure SQL Database — and this time it worked. We explained several details: BACPAC files included both schema and security objects, including users. If external Active Directory users are not present in the target environment can cause import failures. Before exporting, review and remove or adjust user objects to avoid this issue — particularly when migrating across subscriptions, servers, or organizations with different Azure AD tenants.Lesson Learned #531: Scalar UDF vs Parallelism
Last week I worked on a support case where our customer reported that the exact same query, executed against two identical databases with the same resources, was taking significantly longer on one of them. Both databases had the same number of rows, up-to-date statistics, and identical indexes. We started by collecting the execution plans, and I’d like to share what we found. Comparing both execution plans, in the XML of the execution plan that is taking more time, we found the following line in <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="TSQLUserDefinedFunctionsNotParallelizable"> However in the XML of execution plan that is taking less time we found <QueryPlan DegreeOfParallelism="1" ContainsInlineScalarTsqlUdfs="true"> So, based on this difference, it is clear that the query is using a Scalar UDF but in one of the database, based on the definition of this Scalar UDF function is not possible to run the query in parallel. But in the other database even using Scalar UDF it is possible. As both databases are using the same compatibility level of 160, we started to analyze what is different on both that leads to this behavior, sharing with you an example. DROP TABLE IF EXISTS dbo.TestData; GO CREATE TABLE dbo.TestData ( ID INT IDENTITY(1,1) PRIMARY KEY, Value1 INT, Value2 INT ); INSERT INTO dbo.TestData (Value1, Value2) SELECT ABS(CHECKSUM(NEWID()) % 10000), ABS(CHECKSUM(NEWID()) % 10000) FROM sys.all_objects a CROSS JOIN sys.all_objects b WHERE a.object_id < 150 AND b.object_id < 150; Let's create the Scalar function that blocks the parallel execution. CREATE OR ALTER FUNCTION dbo.fn_BlockParallel (@v1 INT) RETURNS INT AS BEGIN DECLARE @x INT; SELECT @x = DATEDIFF(MILLISECOND, GETDATE(), SYSDATETIME()); RETURN ISNULL(@x, 0); END; When I executed the following query I see in the XML file the following - <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="TSQLUserDefinedFunctionsNotParallelizable" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="216"> SELECT ID, dbo.fn_BlockParallel(Value1) FROM dbo.TestData WHERE Value1 > 100 OPTION (MAXDOP 4); GO If I modified the code for a new Scalar UDF, I see: <QueryPlan DegreeOfParallelism="1" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="272" ContainsInlineScalarTsqlUdfs="true"> CREATE OR ALTER FUNCTION dbo.fn_BlockParallel (@v1 INT) RETURNS INT AS BEGIN DECLARE @x INT; SELECT @x = v1 * 2; RETURN @x; END; So, even when using compatibility level 160, certain constructs inside scalar UDFs can prevent inlining, which in turn blocks query parallelism. When performance varies between environments, one of the things to check is whether scalar UDFs are involved, and if they are eligible for inlining. To detect the issue quickly, look at the execution plan XML and check the attributes DegreeOfParallelism, ContainsInlineScalarTsqlUdfs, and NonParallelPlanReason.Lesson Learned #533: Intermittent Azure SQL Database Connectivity and Authentication Issues
While working on a recent service request, we helped a customer troubleshoot intermittent connection and authentication failures when accessing Azure SQL Database using Active Directory (Entra ID) authentication from a Java-based application using HikariCP with JDBC/ODBC. They got the following error: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Failed to authenticate.. Request was throttled according to instructions from STS. Retry in 29701 ms. java.sql.SQLTransientConnectionException: HikariPool-application1 - Connection is not available, request timed out after The first insight was focusing in the error message: Request was throttled according to instructions from STS. Retry in 29701 ms. This message seems it is returned by the Azure Active Directory Security Token Service (STS) when the client is sending too many token requests in a short period of time, exceeding the allowed threshold. We don't have all the details about, but, in high-concurrency environments (e.g., multiple threads, large connection pool) causes each thread to independently request a new token and we could reach a limit in this service, even, if the connection pool retries frequently or fails authentication, the number of token requests can spike. This is the reason, that HikariCP tries to initialize or refresh connections quickly, as many threads attempt to connect at once, and all trigger token requests simultaneously, STS throttling is reached. In order to avoid this situation, could be different topics, like, ensure our application caches tokens and reuses them across threads, using Managed Identity, increase the retry after delay, or perhaps, depending on HikariCP configuration, pre-warm connections gradually. Of course, discuss with your EntraID administration is other option.Lessons Learned #534: Azure SQL Database Connections with Managed Identity and Python ODBC
We worked on a service request that our customer trying to enable their Python application, hosted on Azure App Service, to connect securely to Azure SQL Database using a user-assigned managed identity. They attempted to use the Microsoft ODBC Driver for SQL Server with the managed identity for authentication. During our troubleshooting process we found several issues/error messages causing by an incorrect settings in the connection string: The initial connection string used the 'ActiveDirectoryInteractive' authentication method, which is not compatible with managed identities. The correct approach is to use 'ActiveDirectoryMsi' or 'ActiveDirectoryManagedIdentity' for system/user-assigned managed identities. Switching to 'ActiveDirectoryMsi' led to a pyodbc error: pyodbc.Error: (FA005, [FA005] [Microsoft][ODBC Driver 18 for SQL Server]Cannot use Access Token with any of the following options: Authentication, Integrated Security, User, Password. (0) (SQLDriverConnect)). The FA005 error message indicated a mismatch between the use of an access token and the connection string properties. Specifically, when passing an access token, the connection string must not include conflicting authentication parameters such as User, Password, or Integrated Security.140Views0likes0CommentsLesson Learned #532:Power BI Refresh Failure Due to Connection Pool Exhaustion in Azure SQL Database
We've been working on a service request that a customer experienced frequent failures when refreshing Power BI reports connected to an Azure SQL Database. The error message indicated a problem with acquiring connections from the data source pool, leading to unsuccessful report refreshes. We found the following error message: A request for a connection from the data source pool could not be granted. Retrying the evaluation may solve the issue. The exception was raised by the IDbCommand interface. Analyzing the details of the case, we found that the issue occurred regardless of whether Entra ID or SQL authentication was used, we don't have issue at Azure SQL Database level and not login error, but, we identified a high number of simultaneous connection attempts from Power BI to Azure SQL Database. We also reviewed the configuration of Power BI Desktop and noted that it loads multiple tables in parallel during refresh operations. This behavior triggers a surge of concurrent connections to the database, which in this scenario resulted in exhaustion of the connection pool at the application layer. We suggested to reduce the parallel table loading setting in Power BI Desktop, using File > Options and settings > Options > Data Load and Under Parallel loading of tables. Later adjusted the setting to a higher value to find a balance between performance and stability.Lesson Learned #530: Comparing Execution Plans to Expose a Hidden Performance Anti-Pattern
One of the most powerful features of SSMS Copilot is how it lets you compare execution plans and immediately show you performance issues. In this case, I would like to share with you my lesson learned comparing two queries and how they behave very differently inside the engine. We have the following queries, these are using a table _x_y_z_MS_HighCPU that contains 4 millon of rows. The column TextToSearch is a varchar(200) datatype. -- Query 1 SELECT COUNT(*) FROM [MSxyzTest].[_x_y_z_MS_HighCPU] WHERE TextToSearch = N'Value: 9'; -- Query 2 SELECT COUNT(*) FROM [MSxyzTest].[_x_y_z_MS_HighCPU] WHERE TextToSearch = 'Value: 9'; Since the query texts are different, each will have a different query ID in Query Store. By running the following T-SQL, for example, I can identify the query IDs. SELECT qsqt.query_sql_text, qsq.query_id, qsp.plan_id, qsp.query_plan_hash, qsp.last_execution_time FROM sys.query_store_query_text qsqt JOIN sys.query_store_query qsq ON qsqt.query_text_id = qsq.query_text_id JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id WHERE qsqt.query_sql_text LIKE '%SELECT COUNT(*)%' -- FROM [[MSxyzTest]].[[_x_y_z_MS_HighCPU]]%' ORDER BY qsp.last_execution_time DESC; Queries 1 and 2 can be compared directly. Using Copilot, I ran the following prompt: Compare the execution plans for the two queries (query id 1 and query id 2 using Query Store. Highlight any differences in operators, estimated vs actual row counts, or implicit conversions. Running the following prompt : CPU Usage: Please, show the top resource-consuming queries in the current database using Query Store data. Include query text, execution count, duration, CPU time, and logical reads. We could see the impact of using an antipattern:145Views0likes0CommentsLesson Learned #528: Arithmetic overflow error converting IDENTITY to data type int
Some days ago, we were working on a service request where our customer was experiencing an issue with their application while reading a CSV file from Azure Blob Storage and writing the data into an Azure SQL Database table. They discovered that thousands of rows were being skipped. While analyzing the issue, we identified the following error message: Arithmetic overflow error converting IDENTITY to data type int. After a investigation, we found the root cause: The target table in production had an INT-typed IDENTITY column. The identity value had reached the maximum value of an INT (2,147,483,647). The application was configured with SET IDENTITY_INSERT ON, so it was trying to explicitly insert identity values from the CSV file. When it attempted to insert a value above the INT limit, it failed with an arithmetic overflow. How to reproduce the issue in SQL: CREATE TABLE dbo.TestDemo ( ID INT IDENTITY(1,1) PRIMARY KEY, Name NVARCHAR(100) ); -- Set the identity value to the maximum possible value for INT -- Only for testing... DBCC CHECKIDENT ('dbo.TestDemo', RESEED, 2147483646); SET IDENTITY_INSERT dbo.TestDemo ON; INSERT INTO dbo.TestDemo (ID, Name) VALUES (2147483647, 'Last Row'); INSERT INTO dbo.TestDemo (ID, Name) VALUES (2147483648, 'Overflow'); SET IDENTITY_INSERT dbo.TestDemo OFF;155Views0likes0CommentsLesson Learned #359: TCP Provider: Error code 0x68 (104) (SQLExecDirectW)
Today, we got a service request that our customer faced the following error message connecting to the database: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)'). This customer is using Python in a Linux environment. Following I would like to share my lessons learned about this error message.26KViews2likes4CommentsLesson Learned #527:Calling Azure OpenAI with Managed Identity via sp_invoke_external_rest_endpoint
A day ago, I was working on a service request where our customer got the following error message: {"response":{"status":{"http":{"code":401,"description":""}},"headers":{"Date":"Mon, 07 Jul 2025 19:36:30 GMT","Content-Length":"297","Content-Type":"application\/json","apim-request-id":"cabfb91a-5ede-4371-91d5-XXX","x-ms-client-request-id":"Not-Set","strict-transport-security":"max-age=31536000; includeSubDomains; preload","x-content-type-options":"nosniff"}},"result":{"error":{"code":"PermissionDenied","message":"The principal `XXXX-YYYY-4d9e-8e70-13c98fb84e7a` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation."}}}. Following I would like to share my experience resolving this issue. The first thing was to try reproducing the issue in our lab. So, I began integrating Azure OpenAI with Azure SQL Database to perform sentiment analysis based on fictitious feedback. I created a SQL table called CustomersInfo which contains fictitious customer feedback: CREATE TABLE CustomersInfo ( CustomerID INT PRIMARY KEY IDENTITY(1,1), Name NVARCHAR(100), Feedback NVARCHAR(MAX), Sentiment NVARCHAR(20) NULL ); INSERT INTO CustomersInfo (Name, Feedback) VALUES ('Anna', 'The product arrived damaged and no one responded to my messages.'), ('John', 'I loved the service, it was fast and the product is excellent.'), ('Emily', 'It was okay, but I think packaging could be better.'), ('David', 'I will never buy here again, terrible service.'), ('Sophia', 'Everything was perfect, thank you for the follow-up.'), ('Michael', 'Delivery time was average, but the product did not meet expectations.'), ('Laura', 'Great overall experience, I would recommend it.'), ('James', 'I expected more quality for the price I paid.'), ('Isabella', 'Easy to order, great customer support.'), ('Robert', 'I didn’t like it, but at least they gave me a refund.'); I configured Azure OpenAI and permissions by creating an Azure OpenAI resource Endpoint: https://openaiexample.openai.azure.com Model: gpt-4 Roles granted to the EntraID user that is connecting to the database: Cognitive Services OpenAI User Cognitive Services User I then enabled the SQL database to call the OpenAI endpoint using Managed Identity: CREATE DATABASE SCOPED CREDENTIAL [https://openaiexample.openai.azure.com] WITH IDENTITY = 'Managed Identity', SECRET = '{"resourceid":"https://cognitiveservices.azure.com"}'; DECLARE @response NVARCHAR(MAX); DECLARE Payload NVARCHAR(MAX) = '{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Classify the sentiment of the following customer feedback as Positive, Negative, or Neutral. Feedback: The product arrived damaged and no one responded to my messages." } ], "max_tokens": 50 }'; EXEC sp_invoke_external_rest_endpoint @url = 'https://openaiexample.openai.azure.com/openai/deployments/gpt-4.1-jmjurado/chat/completions?api-version=2025-01-01-preview', @method = 'POST', @credential = [https://openaiexample.openai.azure.com], Payload = Payload, @response = @response OUTPUT; DECLARE @json NVARCHAR(MAX) = @response; SELECT JSON_VALUE(c.value, '$.message.content') AS Respuesta FROM OPENJSON(@json, '$.result.choices') AS c; To encapsulate this logic, we created a stored procedure: CREATE OR ALTER PROCEDURE AnalizarSentimiento @Texto NVARCHAR(MAX), @Sentimiento NVARCHAR(50) OUTPUT AS BEGIN DECLARE @response NVARCHAR(MAX); DECLARE Payload NVARCHAR(MAX) = '{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Classify the sentiment of the following customer feedback as Positive, Negative, or Neutral. Feedback: ' + @Texto + '" } ], "max_tokens": 50 }'; EXEC sp_invoke_external_rest_endpoint @url = 'https://openaiexample.openai.azure.com/openai/deployments/gpt-4.1-jmjurado/chat/completions?api-version=2025-01-01-preview', @method = 'POST', @credential = [https://openaiexample.openai.azure.com], Payload = Payload, @response = @response OUTPUT; SELECT @Sentimiento = JSON_VALUE(c.value, '$.message.content') FROM OPENJSON(@response, '$.result.choices') AS c; END; Now, I'm ready to execute the procedure and retrieve the data: DECLARE @Sentimiento NVARCHAR(50); EXEC AnalizarSentimiento @Texto = 'The product arrived damaged and no one responded to my messages.', @Sentimiento = @Sentimiento OUTPUT; SELECT @Sentimiento AS Resultado; However, I got the following result {"response":{"status":{"http":{"code":401,"description":""}},"headers":{"Date":"Mon, 07 Jul 2025 19:36:30 GMT","Content-Length":"297","Content-Type":"application\/json","apim-request-id":"cabfb91a-5ede-4371-91d5-XXX","x-ms-client-request-id":"Not-Set","strict-transport-security":"max-age=31536000; includeSubDomains; preload","x-content-type-options":"nosniff"}},"result":{"error":{"code":"PermissionDenied","message":"The principal `XXXX-YYYY-4d9e-8e70-13c98fb84e7a` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation."}}} Analyzing the error message, it appears that principal ID that I'm using for this Managed Identity to perform the call has not access to the endpoint. Even adding the client_id keyword didn't resolve the issue. CREATE DATABASE SCOPED CREDENTIAL [https://openaiexample.openai.azure.com] WITH IDENTITY = 'Managed Identity', SECRET = '{"client_id":"YYYYY-ZZZZ-4b75-XXX-9ab8e8c14c1e", "resourceid":"https://cognitiveservices.azure.com"}'; After several troubleshooting steps and by analyzing the error message, I identified that the client_id from the error message referenced in the response belongs to the system-assigned managed identity of the Azure SQL Server. Once I granted the necessary permissions to this identity, I was able to connect and successfully perform the operation.549Views0likes0CommentsLesson Learned #525: Tracking Command Timeouts in Azure SQL: Beyond Query Store with Extended Events
A few days ago, we were working on a support case where our customer was intermittently experiencing command timeouts. What made the case interesting was that queries which usually completed in under one second suddenly started taking more than 10 seconds to execute. Since the application — developed in Python using the ODBC Driver 18 for SQL Server — had a command timeout set to 5 seconds, the following error was triggered every time the threshold was exceeded: Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') The application had built-in retry logic, dynamically increasing the timeout in each of the three retry attempts, to allow time for the query to complete and to log enough data for post-error analysis. Example logs from the retry logic: (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0445 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 9 seconds. Attempt 2 of 3 with new timeout 9. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0532 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 13 seconds. Attempt 3 of 3 with new timeout 13. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') (RunCommandTimeout) - Thread: 39808 - Loop:2/5 Execution Time: 9.7537 seconds My first prompt using SSMS Copilot was this "Review the queries that experienced a command timeout or were aborted in the last 30 minutes. Include query text, queryid, duration, and the reason and code for the abort if available." and I got the following results. So, all points that the query 216 got command timeouts. My next question, was, for query ID 216, show the number of total executions reporting that is 28 executions. The response showed 28 executions, but this number didn’t match the number of aborted and non-aborted executions observed in the application logs, why this difference? Checking the table sys.query_store_runtime_stats I found 10 rows all having execution_type = 3, and total executions 28. So, that's mean that Query Store aggregates query execution data over a fixed interval. So, the execution_type is an indicator that at least an execution during this runtime interval was aborted. So, at least several of them were aborted and other not. To obtain a more granular and accurate picture, I created an Extended Events session to capture these events using ring_buffer target. CREATE EVENT SESSION [CommandAborted] ON DATABASE ADD EVENT sqlserver.attention( ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.username, sqlserver.database_name, sqlserver.sql_text ) ) ADD TARGET package0.ring_buffer WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS); GO ALTER EVENT SESSION [CommandAborted] ON DATABASE STATE = START; after reproducing the command timeout scenario again, I was able to see only the aborted executions. So, in this case, 28 executions were executed and 7 executions were aborted. WITH RingBufferXML AS ( SELECT CAST(t.target_data AS XML) AS target_data FROM sys.dm_xe_database_session_targets t JOIN sys.dm_xe_database_sessions s ON t.event_session_address = s.address WHERE t.target_name = 'ring_buffer' AND s.name = 'CommandAborted' ) SELECT x.value('@name', 'varchar(50)') AS event_name, x.value('@timestamp', 'datetime2') AS event_time, x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') AS client_app_name, x.value('(action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS sql_text, x.value('(data[@name="duration"]/value)[1]', 'bigint') AS duration_microseconds, CAST(x.value('(data[@name="duration"]/value)[1]', 'bigint') / 1000000.0 AS decimal(10,3)) AS duration_seconds FROM RingBufferXML CROSS APPLY target_data.nodes('//event') AS tab(x) WHERE x.value('@name', 'varchar(50)') = 'attention' and x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') = 'TEST-DataCon' ORDER BY event_time DESC;180Views0likes0Comments