qds
3 TopicsLesson Learned #386: Can I log when Query Data Store (QDS) fails to capture a query?
In a recent service request, a customer asked if there was a way to identify when QDS fails to capture a query due to various reasons, such as high workload. Here's what I learned from this experience.2.1KViews0likes0CommentsQTip: Getting timeouts and exceptions using Azure SQL DB Query Store
Requirements: Azure SQL Database SQL Server Management Studio (SSMS) Tables testcodes and testinvoice used for this demo 1-Configure Query Store to collect data and save as soon as possible to be able to review it * Data flush interval is the time to send information to disk ** Statistics collection interval is the time range to collect data , query store is not collecting one record by execution is collecting statistics of each execution and you can get max, min, avg but not excecution by excecution *** Query store capture mode ALL will collect information of all executions query store will not make any kind of filter 2-Create table testcodes CREATE TABLE [dbo].[testcodes]( [Code] [char](5) NULL, [Description] [nchar](10) NULL ) ON [PRIMARY] GO 3-Create table testinvoice CREATE TABLE [dbo].[testinvoice]( [code] [char](5) NULL, [pieces] [int] NULL ) ON [PRIMARY] GO 4-Add some codes to be used 5-Add some records to invoice Exception ... 6-Run query to get information from invoices and get description from codes select *,description=(select testcodes.Code from testcodes where testcodes.code=invoice.code) from testinvoice invoice Is possible to see execution without error 7-Now add in codes a second code BBB to force exception 8-Run query from point 6 again Result is an exception 9-Run query to see queries with exceptions or timeouts starting 2 days ago (declare @datestart as datetime = dateadd(D,-2,getdate());) declare @datestart as datetime = dateadd(D,-2,getdate()); declare @datefinish as datetime = getdate(); /* if you want to set to specific time */ --set @datestart = '2025-04-09 00:00:00'; --set @datefinish = '2025-04-09 23:59:59'; select rs.last_execution_time, rs.execution_type_desc, qt.query_sql_text, q.query_id, CONVERT(VARCHAR(1000), q.query_hash, 1) as strqueryhash, p.plan_id, rs.last_cpu_time, rs.last_duration, rs.count_executions, rs.last_rowcount, rs.last_logical_io_reads, rs.last_physical_io_reads, rs.last_query_max_used_memory, rs.last_tempdb_space_used, rs.last_dop, p.is_forced_plan, p.last_force_failure_reason, p.last_force_failure_reason_desc FROM sys.query_store_query_text AS qt JOIN sys.query_store_query AS q ON qt.query_text_id = q.query_text_id JOIN sys.query_store_plan AS p ON q.query_id = p.query_id JOIN sys.query_store_runtime_stats AS rs ON p.plan_id = rs.plan_id where rs.last_execution_time>= @datestart and rs.last_execution_time<=@datefinish and (rs.execution_type=3 or rs.execution_type=4) -- 3 timeout, 4 error --and qt.query_sql_text like '%actual%' --and q.query_hash=0x009C458D20394C37 --and p.plan_id=12 ORDER BY rs.last_execution_time DESC 10-In query there are some commented lines that you can use to add more filters or modify them To see all records comment line 20 declare @datestart as datetime = dateadd(D,-2,getdate()); declare @datefinish as datetime = getdate(); /* if you want to set to specific time */ --set @datestart = '2025-04-09 00:00:00'; --set @datefinish = '2025-04-09 23:59:59'; select rs.last_execution_time, rs.execution_type_desc, qt.query_sql_text, q.query_id, CONVERT(VARCHAR(1000), q.query_hash, 1) as strqueryhash, p.plan_id, rs.last_cpu_time, rs.last_duration, rs.count_executions, rs.last_rowcount, rs.last_logical_io_reads, rs.last_physical_io_reads, rs.last_query_max_used_memory, rs.last_tempdb_space_used, rs.last_dop, p.is_forced_plan, p.last_force_failure_reason, p.last_force_failure_reason_desc FROM sys.query_store_query_text AS qt JOIN sys.query_store_query AS q ON qt.query_text_id = q.query_text_id JOIN sys.query_store_plan AS p ON q.query_id = p.query_id JOIN sys.query_store_runtime_stats AS rs ON p.plan_id = rs.plan_id where rs.last_execution_time>= @datestart and rs.last_execution_time<=@datefinish --and (rs.execution_type=3 or rs.execution_type=4) -- 3 timeout, 4 error --and qt.query_sql_text like '%actual%' --and q.query_hash=0x009C458D20394C37 --and p.plan_id=12 ORDER BY rs.last_execution_time DESC Now you can see all records 11-Reproduce error several times within 1 minute to see value in execution count (number of executions inside of statistics collection interval) Timeout ... 12-Configure command timeout different to 0 for this demo use 10 (seconds) in parameters previous to connect Second option 12-Use query below that will run 1000 times and at some point will fail INSERT INTO [dbo].[testinvoice] SELECT * FROM [dbo].[testinvoice] GO 1000 13-Run query from point 9 to see data in query store Now you can reproduce and get data about all excecutions , exceptions and timeouts good luck!Lesson Learned #525: Tracking Command Timeouts in Azure SQL: Beyond Query Store with Extended Events
A few days ago, we were working on a support case where our customer was intermittently experiencing command timeouts. What made the case interesting was that queries which usually completed in under one second suddenly started taking more than 10 seconds to execute. Since the application — developed in Python using the ODBC Driver 18 for SQL Server — had a command timeout set to 5 seconds, the following error was triggered every time the threshold was exceeded: Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') The application had built-in retry logic, dynamically increasing the timeout in each of the three retry attempts, to allow time for the query to complete and to log enough data for post-error analysis. Example logs from the retry logic: (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0445 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 9 seconds. Attempt 2 of 3 with new timeout 9. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0532 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 13 seconds. Attempt 3 of 3 with new timeout 13. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') (RunCommandTimeout) - Thread: 39808 - Loop:2/5 Execution Time: 9.7537 seconds My first prompt using SSMS Copilot was this "Review the queries that experienced a command timeout or were aborted in the last 30 minutes. Include query text, queryid, duration, and the reason and code for the abort if available." and I got the following results. So, all points that the query 216 got command timeouts. My next question, was, for query ID 216, show the number of total executions reporting that is 28 executions. The response showed 28 executions, but this number didn’t match the number of aborted and non-aborted executions observed in the application logs, why this difference? Checking the table sys.query_store_runtime_stats I found 10 rows all having execution_type = 3, and total executions 28. So, that's mean that Query Store aggregates query execution data over a fixed interval. So, the execution_type is an indicator that at least an execution during this runtime interval was aborted. So, at least several of them were aborted and other not. To obtain a more granular and accurate picture, I created an Extended Events session to capture these events using ring_buffer target. CREATE EVENT SESSION [CommandAborted] ON DATABASE ADD EVENT sqlserver.attention( ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.username, sqlserver.database_name, sqlserver.sql_text ) ) ADD TARGET package0.ring_buffer WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS); GO ALTER EVENT SESSION [CommandAborted] ON DATABASE STATE = START; after reproducing the command timeout scenario again, I was able to see only the aborted executions. So, in this case, 28 executions were executed and 7 executions were aborted. WITH RingBufferXML AS ( SELECT CAST(t.target_data AS XML) AS target_data FROM sys.dm_xe_database_session_targets t JOIN sys.dm_xe_database_sessions s ON t.event_session_address = s.address WHERE t.target_name = 'ring_buffer' AND s.name = 'CommandAborted' ) SELECT x.value('@name', 'varchar(50)') AS event_name, x.value('@timestamp', 'datetime2') AS event_time, x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') AS client_app_name, x.value('(action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS sql_text, x.value('(data[@name="duration"]/value)[1]', 'bigint') AS duration_microseconds, CAST(x.value('(data[@name="duration"]/value)[1]', 'bigint') / 1000000.0 AS decimal(10,3)) AS duration_seconds FROM RingBufferXML CROSS APPLY target_data.nodes('//event') AS tab(x) WHERE x.value('@name', 'varchar(50)') = 'attention' and x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') = 'TEST-DataCon' ORDER BY event_time DESC;172Views0likes0Comments