Azure SQL Troubleshooting Power BI dataset refresh failures

Published Mar 24 2022 11:09 AM 1,781 Views
Microsoft

Recently we worked on a case where the Power BI refreshes were failing intermittently. 90% of time, things would work just fine, but sporadically our customer would see similar errors to the one below

 

PowerBi_Error.png

 

”Execution Timeout Expired” suggests a problem with a Command Timeout. A common misconception is to use connection vs command concepts interchangeably.

  • Command timeout setting is the wait time before terminating the attempt to execute a command (query).
  • Connection timeout is the wait time before terminating the attempt to establish a connection with SQL Server.

We can identify command timeouts via query store with sys.query_store_runtime_stats using the execution_type or execution_type_desc column.

 

Part of the complexity with this case was that there were many unrelated timeouts with multiple applications across the board and Power BI data refreshes running consistently all day long, so we had the needle in a haystack dilemma.

 

Some timestamps in the Power BI logs will be presented as local timezone while others are displayed in UTC. Be certain on the time zone of the particular timestamp you are researching to have correct facts to continue with the investigation.

 

In this support case, our customer didn’t manually code the queries. They relied on Power BI Desktop to automatically build the queries that they needed for their dataset, so we didn’t have the query text available to match against a query hash and track down the executions via Query Store.

 

Fortunately, Power BI Desktop has a built-in tracing mechanism.

PowerBi_Tracing.png

 

After you have configured tracing, you can perform a manual refresh with Power BI Desktop and then proceed to inspect the log files.

These files are located by default on the %LOCALAPPDATA%\Microsoft\Power BI Desktop\Traces folder

 

The log files will have the queries that Power BI sent to the database. Also, we can see the command timeout set by Power BI on each query execution. I observed in my testing that the default is set to 10 minutes. However, you can override the default in Power BI desktop using the advanced options

 

PowerBi_CommandTimeout.png

 

Using the PowerShell script below, I extracted all the queries from the log files to a single text file

 

 

$outputFile = "C:\tmp\output.txt"
$pattern = "CommandText`":`"(.+)`",`"CommandTimeout" 

Select-String -Pattern $pattern -Path "$env:LOCALAPPDATA\Microsoft\Power BI Desktop\Traces\*.log" | 
ForEach-Object { 
    Out-File -InputObject ($_.matches.Groups[1].value.ToString().Replace("\r\n", " ")) -Append -FilePath $outputFile; 
    Out-File -InputObject "`r`n GO `r`n" -Append -FilePath $outputFile; 
}

 

 

The queries are straightforward, mostly they are getting rows from each table that was selected for the dataset as well as metadata attributes of the schema (tables, columns, indexes, etc.)

 

After getting the query text we generated an estimated query plan for each one and inspected the properties window to get the query hash generated by the SQL engine.

 

Using the query hash, we reviewed “sys.query_store_runtime_stats” checking for timeouts. Interestingly the queries timing out were only extracting metadata information and not the actual data from the user defined tables. Reviewing sys.query_store_wait_stats we saw that the main wait was on a lock.

QDS_WaitStats.png

 

As this problem happened intermittently and not knowing when the issue may surface again. We turned to a fantastic free and open-source tool sp_whoisactive written by Adam Machanic.

 

This stored procedure can be used to check what is running right now on the database. It has many options that can be toggled via parameters. It has a very convenient parameter to help us analyze the blocking chain and a parameter to capture the output to a table for post-mortem analysis.

 

Using an on prem SQL Server that the customer had in their infrastructure we created a linked server to Azure SQL Database and using the SQL agent on premises we created a job running every 10 minutes to capture the output of sp_whoisactive.

 

After a few days passed by, the issue surfaced again and we checked the table where we stored the output of sp_whoisactive.

We observed a similar pattern like the one below which I reproduced in my lab

 

 

exec sp_whoisactive
 @find_block_leaders = 1
,@sort_order = '[blocked_session_count] DESC'
,@output_column_list = '[session_id][sql_text][wait_info][blocking_session_id][blocked_session_count][program_name]'

 

 

 

Blocking_Chain.png

 

There was a lead blocker session executing a long running query issued from one of our customer's applications, that was blocking a truncate statement issued by the same application which in turn was blocking the Power BI query (program name = "Mashup engine").  Once the execution timeout expired Power BI terminated the connection and sent the error to the end user.

 

With sp_whoisactive we can see that these sessions hold incompatible locks hence why the query has to wait for the release of the lock

 

At this point the customer turned their attention to the lead blocker query to optimize it and reduce the impact of the blockings

 

Cheers!

1 Comment
Co-Authors
Version history
Last update:
‎Mar 24 2022 08:51 AM
Updated by: