Return of the join in Power BI with Kusto
If you are not interested in all the details:
Before
The experience of many users, trying out Power BI with Kusto/ADX data was not always great.
Kusto’s main claim to fame is great performance with very large tables.
Trying to use such tables in PBI using Direct Query was not always fast and sometimes failed with errors about not enough memory to perform the query.
The reason behind this problem in many cases was the use of joins between tables.
You can read about joins here.
One of the important recommendations is when joining a large table (Fact) with a much smaller table (Dimension), is to mention the small table first:
Customers | join kind=rightouter FactSales on CustomerKey
It is also recommended in such cases to add a hint:
Customers | join hint.strategy=broadcast kind=rightouter FactSales on CustomerKey
The hint allows the join to be fully distributed using much less memory.
When you create a relationship between two tables from Kusto, and both tables use Direct Query, PBI will generate a join between the two tables.
The join will be structured in this way:
FactSales | join Customers on CustomerKey
This is exactly the opposite of the recommended way mentioned above.
Depending on the size of the fact table, such a join can be slow, use a lot of memory or fail completely.
In blogs and presentations, we recommended few workarounds.
I’m mentioning it here because I would like to encourage you to revisit your working PBI reports and reimplement the joins based on the new behavior that was just released.
If the dimension tables are imported, no joins will be used.
Any filters based on a dimension table will be implemented as a where based on the column that is the base of the relationship. In many cases this where statement will be very long:
| where CustomerKey in (123,456,678,…) .
If you filter on the gender F and there are 10,00 customers with F in the gender column, the list will include 10,000 keys.
This is not optimal an in extreme cases it may fail.
This solution will have good performance, but it requires more understanding of KQL and is different from the way normal PBI tables behave
Same as the previous method but requires even a deeper understanding of Kusto.
Starting from the September version of PBI desktop, the Kusto connector was updated, and you can use relationships between dimension tables and fact table like with any other source with a few changes:
In the attached example you can see two dimension tables with relationships to a fact table. The relationships are M:M and you can see the generated KQL in the text box.
The query as shown takes 0.5 seconds of CPU and uses 8MB of memory
The same query without the new setting takes 1.5 seconds of CPU and 478MB of memory.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.