To BE or not to be – case sensitive
Power BI, Power Query and ADX/RTA in Fabric
Summary
You use these combination – Power BI including Power query with data coming from Kusto/ADX/RTA in Fabric.
Is your solution case sensitive? Is “PoKer” == “poker”?
Depends on who you ask,
Power Query says definitely no, Power BI says definitely yes and Kusto says that
“PoKer” == “poker” is false but “PoKer” =~ “poker” is true.
What about your data? Is the same piece of information always written in the same way or is it sometimes Canada and some other times Canada?
In this article I’ll highlight the challenges of using mixed case data and navigating the differences between the different technologies.
Power BI
Chris Webb in his blog writes:
Case sensitivity is one of the more confusing aspects of Power BI: while the Power Query engine is case sensitive, the main Power BI engine (that means datasets, relationships, DAX etc.) is case insensitive
In this post, Chris mentioned a way to do cases insensitive comparisons in PQ but it is not supported in Direct Query.
Kusto/ADX/RTA
Kusto is case sensitive. Every function and language term must be written in the right case which is usually all lower case.
The same with tables, functions and columns.
What about text comparisons?
The KQL language offers case sensitive and case insensitive comparisons:
== vs. =~, != vs. !~ , has_cs vs. has , in vs. in~, contains_cs vs. contains
Comparisons created in Power BI or in Power Query and folded to KQL
The connector uses by default case sensitive comparisons: has_cs and ==
You can change this behavior by using settings in the connection.
Mixed case data
This is the trickiest topic. I attach a PBI report that shows a list of colors in two pages.
The slicer on the first page is showing the color Blue twice. If you edit the query, you can see that there are some products that have the color as “blue” all lower case.
This is confusing PBI and it shows two different variations that look exactly the same.
If you filter on either version in the first page, you get the same value which is the value of the version “Blue”.
For the second page I created a copy of the query where the versions of Blue are well separated, and you can see the total for each one. you can see that the total shown on the first page is just for the version “Blue”
What can you do in such cases (pun intended)
I create a third version of the query when I converted all color names to be proper cased.
The M function for right case couldn’t be used in Direct Query so I added a KQL snippet using the M function Value.NativeQuery.
The snippet is
| extend ColorName = strcat(toupper(substring(ColorName,0,1)),substring(ColorName,1))
In the third page you can see that filtering “Blue” shows the total values for “Upper blue” and “Lower blue” as they appear in the second page.
So, if you have a column in mixed case you must convert all values to a standard case.