Best practices in creating aggregations in Power BI and ADX

Published Jun 26 2022 02:32 AM 1,057 Views
Microsoft

Creating aggregations in Power BI

What are your options?

 

Every visual shown in a report in PBI, contains some form of aggregation

The question is how the aggregations are calculated and at which step in the pipe of bringing the data from the data source to the report.

In this article, I’ll be using data coming from Azure Data Explorer aka Kusto aka ADX.

Most of the content is relevant for other sources as well.

The assumption is that the queries are run in direct query mode.

Most issues are relevant for import mode also but not all options are relevant for both modes.

So how can you create aggregations and what kind of aggregations are available at each stage?

 

 

Starting from a KQL query that already includes summarize

 

You can create a query in Kusto explorer, verify the results and paste it into the Kusto connect dialog

DanyHoter_0-1656235464888.png

 

 

This statement creates two aggregations, sum of sales and average sales, by Country and product color.

Sum of sales is additive; it can be aggregated further and can be filtered by any one of the columns used in the by clause: Country and ColorName.

Average sales cannot be aggregated.

In the attached report you can see in the KQl-Summarize page that the total average sales of a country is the sum of the averages of the colors, which of course is wrong.

PBI uses the default aggregation of sum on the average value, and there is no way to show the right average value.

DanyHoter_1-1656235464891.png

 

 

 

 

Aggregating in the query using the transform/group by feature

 

Power query has a transformation called group by.

DanyHoter_2-1656235464892.png

 



The query generated by PQ is very similar to the query in the first option.

DanyHoter_3-1656235464893.png

 

The effects and limitations are very similar to the first option.

Again, the total average of a country for two colors is the sum of the averages which does not make any sense.

 

 

Aggregating using DAX measures

 

We can skip the summarize step in the original query and include the entire table in the results.

Once back in PBI, we can create measures for the sum and average. (We can just drag the SalesAmount column and choose the aggregation, but it is better to create measures)

This time the numbers are calculated correctly. The reason is that PBI is smart enough to calculate the average by querying for the sum and the count and dividing sum by count.

DanyHoter_4-1656235464894.png

 

When we look at the query sent by PBI to populate this matrix , we see (Simplified for readability):

SalesTable

| project SalesAmount,ColorName,Country

| where ColorName in ("Red","White")

| summarize a0=sum(SalesAmount), a1=countif(isnotnull(SalesAmount)), a2=sum(SalesAmount) by Country

 

If we create a measure that calculates the distinct count of customers, we can see that it also sends the correct KQL to Kusto. If we did the same in the previous methods, we would see the same wrong results when forcing aggregations. Only cases that use both country and color show the correct results

 

 

 

Summary

 

Aggregations should be done as late as possible in the PBI pipe of bringing the data and rendering visuals.

Avoiding aggregations at the source statement or in Power Query is the safest way to get the right values and with good performance.

 

Can measures be used in all cases?

 

Unfortunately, some measures will need to perform the aggregations in PBI and will import large data volumes to the client and will may get errors about bringing more than 500,000 rows.

An example of a DAX function that will need all data is median.

If you see errors or you feel that calculations are too slow, you may need to move the aggregations to the backend an make sure that further aggregations will not distort the results.

 

 

 

 

 

Filtering the results by another column

 

In the first two methods we can’t filter on any other column because we summarize the data before it arrives in PBI and we lose other columns

What about the method with measures?

Let’s say that we want to filter the results by the ClassName column

We leave the column in the result and filter by it using a regular slicer.

The results are calculated correctly, and everything looks perfect.

The KQL is filtering first and summarizing afterwards which is the right way to do it

 

Is there a way to filter by another column even if we summarize in PQ?

 

We can use dynamic M dynamic parameters to insert a filter before the summarize step.

The best practice is to create a Kusto function with parameters, use the parameters to filter the data in the correct position in the query (before the summarize) and bring filtered summary results to the report

You still need to avoid any further aggregations if the values are not additive.

 

 

 

 

Co-Authors
Version history
Last update:
‎Jun 26 2022 02:32 AM
Updated by: