Author: @SuryaJ is a Program Manager in the Azure Synapse Customer Success Engineering (CSE) team.
Kusto Query Language (KQL) is a powerful query language to analyse large volumes of structured, semi structured and unstructured (Free Text) data. It has inbuilt operators and functions that lets you analyse data to find trends, patterns, anomalies, create forecasting, and machine learning. Along with Azure Synapse Data Explorer, other Azure native services like Log Analytics, Azure Monitor and App Insights also use KQL to query data. This blog will be an introduction to KQL and its usage in Azure Synapse Data Explorer.
A KQL query is a read only request that takes a tabular input and produces a tabular output. It follows a simple Unix shell script like structure and uses a Top-Down approach for the query structure. Each operator is separated by a ‘|’ (pipe) delimiter. Similar to relational database systems there are operators available like where, joins, union, and more. In addition, there are also functions available for aggregations, geospatial analytics, Time series analytics and some machine learning as well.
This blog covers the following topics:
KQL is a Microsoft homegrown query language that is made open source on GitHub. The language was developed with freedom and scale in mind. The queries are easy to read and adopt, with a lot of terms taken from plain English. For Example, if we want to select sample 10 records from a table called ‘NycTaxi’, the query is simply “ NycTaxi | take 10 “.
Here are certain reasons that showcase the importance of creating KQL.
Azure Synapse Data Explorer also supports a subset of SQL queries. You can simply write a SQL query and execute to display results.
KQL provides an operator called "explain" to translate SQL queries into KQL
While this approach is fine on simple queries and learning KQL, it is recommended to use KQL for Azure Synapse Data Explorer for more complex analytics. SQL to KQL Cheat Sheet is available to help in the journey from SQL to KQL!
In order to write KQL, let us understand the schema and entities in Azure Synapse Data Explorer.
Similar to relational databases, entities are structured in the following hierarchy.
KQL supports the following datatypes:
While all other data types are standard ones, dynamic is a proprietary data type of Azure Synapse Data Explorer. It helps to traverse through a Json structure and extract any scalar values from arrays or property bags.
In the following example, we see that ‘Trace’ is dynamic column with Json like data. We are able to extract the key-value pair EventType=Demo using the extend operator(explained further below) and dot(.) notation
More on the dynamic data type here : The dynamic data type - Azure Data Explorer | Microsoft Learn
Let’s take an example KQL query and break it down!
𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬
| w𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘳𝘵𝘛𝘪𝘮𝘦 𝙗𝙚𝙩𝙬𝙚𝙚𝙣 (𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟷-𝟶𝟷) .. 𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟸-𝟶𝟷))
| 𝘄𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘵𝘦 == "𝙵𝙻𝙾𝚁𝙸𝙳𝙰"
| 𝗰𝗼𝘂𝗻𝘁
Once a KQL query is received to Azure Synapse Data Explorer Engine, it passes the query through certain stages to make it an optimized and distributed query.
Let’s take an example query and see how it is transformed passing through above stages.
Logs
| where Type== “Error”
| summarize count()
The initial RelOp tree for this query looks like this:
And the distributed query looks like this:
Now that we have seen how a query is structured and optimized by Azure Synapse Data Explorer Engine, we can start writing some basic KQL.
Most of the KQL queries can be fulfilled by certain common operators listed below:
Operator |
Description |
Example |
where |
Filters source data |
.. | where dept == “IT” |
extend |
Computes and adds a new column to source |
.. | extend col3=col1+col2 |
count |
Counts # of records in a table |
.. | count |
project |
Choose required columns in the output |
.. | project col1, col2 |
take |
Get random sample records |
.. | take 100 |
summarize |
Perform aggregations based on function provided after this |
.. | summarize avg(col1) |
join |
Merges rows of two tables based on matching fields |
tbl1 | join tbl2 on id |
union |
Return rows from 2 or more tables |
tbl1 | union tbl2 |
Note: The list is not indicative of final list of operators but commonly used ones.
You can run the following queries using the help cluster which is a read only cluster with already ingested data.
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
print "Hello KQL!!"
Output:
print_0 |
Hello KQL!! |
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
StormEvents
| take 10
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
StormEvents
| where StartTime between (datetime(2007-12-01) .. datetime(2007-12-31))
| count
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
StormEvents
| where StartTime > datetime(2007-01-01) and StartTime < datetime(2007-12-31)
| summarize PropertyDamage_AVG = tolong(avg(DamageProperty)) by EventType
Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]
StormEvents
| summarize avg(DamageProperty) by EventType
| render piechart
Output:
While this blog introduces KQL query and certain operators, you are encouraged to play around with more complex data and operators by
Stay tuned for deep dive into more KQL functions related to Time series, Geo Spatial and Machine Learning and also custom KQL codes(pre-written) in the upcoming blogs.
Our team publishes blog(s) regularly and you can find all these blogs at https://aka.ms/synapsecseblog. For deeper level of understanding of Synapse implementation best practices, please refer to our Success by Design (SBD) site at https://aka.ms/Synapse-Success-By-Design
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.