Blog Post

Azure Data Explorer Blog
3 MIN READ

Return of the Power BI join with Kusto

DanyHoter's avatar
DanyHoter
Icon for Microsoft rankMicrosoft
Sep 04, 2022

Return of the join in Power BI with Kusto

 

Summary

 

If you are not interested in all the details:

  • You can add to the source statement in Power Query for dimension tables the clause: IsDimension=true
  • You can use both dimensions and fact tables in Direct Query mode, and create relationships between them after you add this setting
  • You can see an example in the attached PBI file

Before

 

The experience of many users, trying out Power BI with Kusto/ADX data was not always great.

Kusto’s main claim to fame is great performance with very large tables.

Trying to use such tables in PBI using Direct Query was not always fast and sometimes failed with errors about not enough memory to perform the query.

The reason behind this problem in many cases was the use of joins between tables.

 

Joins in Kusto

 

You can read about joins here.

One of the important recommendations is when joining a large table (Fact) with a much smaller table (Dimension), is to mention the small table first:

Customers | join kind=rightouter FactSales on CustomerKey

It is also recommended in such cases to add a hint:

Customers | join hint.strategy=broadcast kind=rightouter FactSales on CustomerKey

The hint allows the join to be fully distributed using much less memory.

 

Joins in PBI

 

When you create a relationship between two tables from Kusto, and both tables use Direct Query, PBI will generate a join between the two tables.

The join will be structured in this way:

FactSales | join Customers on CustomerKey

This is exactly the opposite of the recommended way mentioned above.

Depending on the size of the fact table, such a join can be slow, use a lot of memory or fail completely.

 

Recommended strategies until now

 

In blogs and presentations, we recommended few workarounds.

I’m mentioning it here because I would like to encourage you to revisit your working PBI reports and reimplement the joins based on the new behavior that was just released.

Import the dimension tables

 

If the dimension tables are imported, no joins will be used.

Any filters based on a dimension table will be implemented as a where based on the column that is the base of the relationship. In many cases this where statement will be very long:

| where CustomerKey in (123,456,678,…) .

If you filter on the gender F and there are 10,00 customers with F in the gender column, the list will include 10,000 keys.

This is not optimal an in extreme cases it may fail.

 

Join the table in a Kusto function and use the function in PBI

 

This solution will have good performance, but it requires more understanding of KQL and is different from the way normal PBI tables behave

 

Join the tables on ingestion using an update policy

 

Same as the previous method but requires even a deeper understanding of Kusto.

 

New behavior

 

Starting from the September version of PBI desktop, the Kusto connector was updated, and you can use relationships between dimension tables and fact table like with any other source with a few changes:

  • In every table that is considered a dimension add IsDimension=true to source step so it will look like

 

  • The data volume of the dimension tables (The columns you will use) should not exceed 10 megabytes.
  • Relationships between two tables from Kusto will be identified by PBI as M:M. You can leave it as M:M but be sure to set the filtering direction to single from the dimension to the fact.
  • When the relationship is M:M, the join kind will be inner. If you want a rightouter join (because you are not sure you have full integrity) you need to force the relationship to be 1:1. You can edit the model using Tabular editor (V2 is enough)

 

Before and after

 

In the attached example you can see two dimension tables with relationships to a fact table. The relationships are M:M and you can see the generated KQL in the text box.

The query as shown takes 0.5 seconds of CPU and uses 8MB of memory

The same query without the new setting takes 1.5 seconds of CPU and 478MB of memory.

 

 

 

 

 

 

 

 

 

 

 

 

 

Published Sep 04, 2022
Version 1.0
No CommentsBe the first to comment