Azure Data Explorer Blog

6 MIN READ

Using .show queries to understand and optimize the KQL sent from PBI to Kusto

DanyHoter

Microsoft

Jun 01, 2023

How to see the queries that PBI (Or any other tool) send to Kusto/ADX

Why do you need to see the queries?

When you use PBI with data from ADX / KQL database (In Fabric) in direct query, the KQL queries are generated by the native connector in stages.

You can create a query by using a quick PBI report in Fabric or use Power BI desktop.

In desktop you can paste a KQL query or navigate to a table / function.

In the Power Query editor, you can do transformations, which will be translated (aka folded) to KQL and added to the original query or table.

Once in the report you will have other elements that will add more parts to the query like measures , calculated columns , slicers , filters , cross highlighting etc.

Each visual on the page is going to summarize data from one or more queries and add the summarize part of the query.

If your model contains multiple tables in direct query with relations between them, the connector will generate joins between the tables.

Selecting values in filters will create multiple where conditions.

In order to see the final query and understand the performance implications of each query and the total query load created by a report, you need to use the command ".show queries" in the context of the database.

Show queries ++

Here is the full query I use for showing the queries and understanding the performance.

.show queries

// By default, queries from PBI start with KPBI, you can add your own prefix

| where ClientActivityId startswith "KPBI"

// Replace the datetime value with the latest StartedOn value if you want to see new activity

| where StartedOn > datetime(2023-06-01T06:10:11.7654474Z)

// Filtering on queries send by the current user. If you are using a gateway or a PBI dataset as a source, queries may include a different User value.

| where User ==current_principal_details().UserPrincipalName

// The newest queries show last

| order by StartedOn asc

// MB will have the memory used on each node, high memory of GB scale can indicate wrong joins

// format_bytes function show the values as MB, GB, PB etc.

| extend MB=format_bytes(MemoryPeak)

// getschema calls are usually not interesting to look at

| extend Isgetschema=Text has "getschema"

// Scanned data is a very important factor in performance.

| extend ScannedData=format_bytes(tolong(CacheStatistics.Shards.Hot.HitBytes))

// Also very important is the % of extents that are scanned out of the total # of extents

| extend ScannedExtentsCount = ScannedExtentsStatistics.ScannedExtentsCount,

TotalExtentsCount = ScannedExtentsStatistics.TotalExtentsCount

| extend ScannedRatio=iff(TotalExtentsCount==0,"",strcat(tostring(round(ScannedRatio=100.*ScannedExtentsCount/TotalExtentsCount,2)),"%"))

// using fork to create multiple result sets

| fork

Queries=(where Isgetschema==false | project StartedOn, Duration, TotalCpu,ScannedRatio, MB,Text | order by StartedOn asc)

Detail=(project StartedOn,User,State,FailureReason, Duration, TotalCpu,ScannedExtentsCount,TotalExtentsCount,ScannedRatio,

Isgetschema, MB, ScannedData, ClientRequestProperties, ClientActivityId, Text | order by StartedOn asc)

Slow=( project Duration, TotalCpu, MB, ScannedData,ScannedRatio, Text | order by Duration)

Summary=(summarize Commands=count(),Queries=count(),DelayCommand=count(),

mn=min(StartedOn),mx=max( LastUpdatedOn),TotCPU=sum(TotalCpu),TotDuration=sum(Duration)

| extend OveralDuration=(mx-mn))

Examples

I ran a few queries from KWE and ran the .show queries ++ (commenting line 3)

Here are the results:

We can see the CPU time and the % of scanned extents growing as the timespan grows from 1 hour up to 1000 days.

Queries created by the attached Report after filtering on one state.

And here is one of the queries text:

The query is pretty long and not very easy to understand.

There are 4 tables joined , filters applied and finally a summarize to get the values.

I added comments to explain the different parts.

// The dimensions appear first as they should in KQL joins

// This is due to a setting in the connection designating the dimensions as such

["Products"]

| project-away ["ProductName"],["ClassName"]

// Filters on dimensions appear right after the dimension

| where (["ProductCategoryName"] == "Home Appliances") and (["ProductSubcategoryName"] == "Washers & Dryers")

// A lot of renames distinguish columns with potentially the same names.

| project-rename ["t2.Manufacturer"] = ["Manufacturer"], ["t2.ColorName"] = ["ColorName"], ["t2.ProductCategoryName"] = ["ProductCategoryName"], ["t2.ProductSubcategoryName"] = ["ProductSubcategoryName"], ["t2.ProductKey"] = ["ProductKey"]

| where isnotnull(["t2.ProductKey"])

// The join is inner join and uses the hint which is the best way to join a small table with a large one

// Other joins with the Dates and Customers tables are nested so the fact table is mentioned last

| join hint.strategy=broadcast kind=inner (["Dates"]

| project ["Datekey"],["CalendarYear"],["CalendarQuarter"]

| project-rename ["t1.Datekey"] = ["Datekey"], ["t1.CalendarYear"] = ["CalendarYear"], ["t1.CalendarQuarter"] = ["CalendarQuarter"]

| where isnotnull(["t1.Datekey"])

| join hint.strategy=broadcast kind=inner (["Customers"]

| project ["CityName"],["CustomerKey"],["RegionCountryName"],["StateProvinceName"]

| where (["RegionCountryName"] == "Australia") and (["StateProvinceName"] == "New South Wales")

| project-rename ["t0.CityName"] = ["CityName"], ["t0.CustomerKey"] = ["CustomerKey"], ["t0.RegionCountryName"] = ["RegionCountryName"], ["t0.StateProvinceName"] = ["StateProvinceName"]

| where isnotnull(["t0.CustomerKey"])

// Finally the fact is mentioned

| join hint.strategy=broadcast kind=inner (["SalesFact"]

| project-rename ["t3.SalesAmount"] = ["SalesAmount"], ["t3.TotalCost"] = ["TotalCost"], ["t3.DateKey"] = ["DateKey"], ["t3.ProductKey"] = ["ProductKey"], ["t3.CustomerKey"] = ["CustomerKey"]

| where isnotnull(["t3.CustomerKey"])) on $left.["t0.CustomerKey"] == $right.["t3.CustomerKey"]

| project ["t3.SalesAmount"], ["t3.TotalCost"], ["t3.DateKey"], ["t3.ProductKey"], ["t3.CustomerKey"], ["t0.CityName"], ["t0.CustomerKey"], ["t0.RegionCountryName"], ["t0.StateProvinceName"]

| where isnotnull(["t3.DateKey"])) on $left.["t1.Datekey"] == $right.["t3.DateKey"]

| where isnotnull(["t3.ProductKey"])) on $left.["t2.ProductKey"] == $right.["t3.ProductKey"]

| project ["t3.SalesAmount"], ["t3.TotalCost"], ["t3.DateKey"], ["t3.ProductKey"], ["t3.CustomerKey"], ["t0.CityName"], ["t0.CustomerKey"], ["t0.RegionCountryName"], ["t0.StateProvinceName"], ["t1.Datekey"], ["t1.CalendarYear"], ["t1.CalendarQuarter"], ["t2.Manufacturer"], ["t2.ColorName"], ["t2.ProductCategoryName"], ["t2.ProductSubcategoryName"], ["t2.ProductKey"]

| project ["t3.SalesAmount"],["t3.TotalCost"],["t1.CalendarYear"],["t2.Manufacturer"]

| summarize ["a0"]=sum(["t3.SalesAmount"]), ["a1"]=sum(["t3.TotalCost"]) by ["t1.CalendarYear"], ["t2.Manufacturer"]
| where (isnotnull(["a0"])) or (isnotnull(["a1"]))
| limit 1000001
| project-rename ["CalendarYear"] = ["t1.CalendarYear"], ["Manufacturer"] = ["t2.Manufacturer"]

// Summarize two values by two columns for the matrix visual

Starting clean

If we want to see the queries send from a refresh of a single visual or a whole page, we can copy the value of StartedOn column in the last row and paste it in the .show queries StartedOn filter.

We run again the query and it should come empty.

Now we run the operation in PBI and after ~30 seconds the queries will show up in the .show queries results

Why is this important

This technique is useful for understanding how PBI generates queries and especially to understand why a page refresh takes the time and resources it takes and figure what we can do to improve.

What to look for in the results

Joins which do not come with hint.strategy=broadcast
Joins which are not using inner join
Queries that take seconds of duration and minutes of CPU
Queries that scan most of the extents even when the time period shown includes a relative small part of the data.
Queries that fail (The state column in the detail result set will show if the query Completed or Failed)
High number of queries for a single visual. Probably a complex DAX measure.

// The summarize part is generated for the matrix that shows totals by Year and Manufacturer

| summarize ["a0"]=sum(["t3.SalesAmount"]), ["a1"]=sum(["t3.TotalCost"]) by ["t1.CalendarYear"], ["t2.Manufacturer"]

| where (isnotnull(["a0"])) or (isnotnull(["a1"]))

| limit 1000001

| project-rename ["CalendarYear"] = ["t1.CalendarYear"], ["Manufacturer"] = ["t2.Manufacturer"]

Updated Jun 02, 2023

Version 5.0