Azure Synapse Analytics Blog

5 MIN READ

Introduction to Kusto Query Language (KQL)

SuryaJ

Microsoft

Mar 23, 2023

Author: @SuryaJ is a Program Manager in the Azure Synapse Customer Success Engineering (CSE) team.

Introduction

Kusto Query Language (KQL) is a powerful query language to analyse large volumes of structured, semi structured and unstructured (Free Text) data. It has inbuilt operators and functions that lets you analyse data to find trends, patterns, anomalies, create forecasting, and machine learning. Along with Azure Synapse Data Explorer, other Azure native services like Log Analytics, Azure Monitor and App Insights also use KQL to query data. This blog will be an introduction to KQL and its usage in Azure Synapse Data Explorer.

A KQL query is a read only request that takes a tabular input and produces a tabular output. It follows a simple Unix shell script like structure and uses a Top-Down approach for the query structure. Each operator is separated by a ‘|’ (pipe) delimiter. Similar to relational database systems there are operators available like where, joins, union, and more. In addition, there are also functions available for aggregations, geospatial analytics, Time series analytics and some machine learning as well.

This blog covers the following topics:

Motivation for a new query language? – Kusto Query Language (KQL)
Can I write SQL to query data? - SQL to KQL journey
What entities are present in Azure Synapse Data Explorer? What data types are present to write KQL? – Schema and data types
How to write a KQL query? – Understanding basic KQL structure
What happens behind the scenes? – Query evaluation and optimization
How do I get started? – Some basic and commonly used KQL operators
Let’s write some queries!!

Kusto Query Language (KQL)

KQL is a Microsoft homegrown query language that is made open source on GitHub. The language was developed with freedom and scale in mind. The queries are easy to read and adopt, with a lot of terms taken from plain English. For Example, if we want to select sample 10 records from a table called ‘NycTaxi’, the query is simply “ NycTaxi | take 10 “.

Here are certain reasons that showcase the importance of creating KQL.

Start with the Table – In KQL, authoring a query becomes easier as we start with the context of table name first. Because of this, user can concentrate on the logic and not in choosing the right columns, filters and aggregates.
Easy to understand query structure – Not straying too far from relational query languages, a lot of operators, filters and functions are available in KQL except in an easier to understand query structure. More on the query structure and formation detailed below.
Productivity boost – with plethora of powerful inbuilt functions, number of lines of KQL code compared to other languages is very less. Ex: A 4–5-line code in python to create timestamp bins for 1 day is done in KQL using bin() function as bin(timestamp,1d)
IntelliSense at Fingertips - IntelliSense starts helping us right from selecting correct table, columns to filters and aggregates.

SQL to KQL Journey

Azure Synapse Data Explorer also supports a subset of SQL queries. You can simply write a SQL query and execute to display results.

KQL provides an operator called "explain" to translate SQL queries into KQL

While this approach is fine on simple queries and learning KQL, it is recommended to use KQL for Azure Synapse Data Explorer for more complex analytics. SQL to KQL Cheat Sheet is available to help in the journey from SQL to KQL!

Schema and data types

In order to write KQL, let us understand the schema and entities in Azure Synapse Data Explorer.

Similar to relational databases, entities are structured in the following hierarchy.

KQL supports the following datatypes:

Boolean
Integer
Real
Decimal
Dates
Timespan
String
Guid
Dynamic (JSON)

While all other data types are standard ones, dynamic is a proprietary data type of Azure Synapse Data Explorer. It helps to traverse through a Json structure and extract any scalar values from arrays or property bags.

In the following example, we see that ‘Trace’ is dynamic column with Json like data. We are able to extract the key-value pair EventType=Demo using the extend operator(explained further below) and dot(.) notation

More on the dynamic data type here : The dynamic data type - Azure Data Explorer | Microsoft Learn

Understand the basic KQL query structure!

Let’s take an example KQL query and break it down!

𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬
| w𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘳𝘵𝘛𝘪𝘮𝘦 𝙗𝙚𝙩𝙬𝙚𝙚𝙣 (𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟷-𝟶𝟷) .. 𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟸-𝟶𝟷))
| 𝘄𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘵𝘦 == "𝙵𝙻𝙾𝚁𝙸𝙳𝙰" 
| 𝗰𝗼𝘂𝗻𝘁

𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬 denotes the table we want to query. This single line is equivalent to ‘select * from StormEvents’ in SQL
Line 2 and 3 are applied to funnel/filter the data from StormEvents table.
1. Line 2 filters ‘StartTime’ datetime column to be between 01 Nov 2007 and 01 Dec 2007
2. Line 3 filters ‘State’ column to be Florida
Finally apply the count() aggregation on the filtered data.

Query evaluation and optimization

Once a KQL query is received to Azure Synapse Data Explorer Engine, it passes the query through certain stages to make it an optimized and distributed query.

Let’s take an example query and see how it is transformed passing through above stages.

Logs 
| where Type== “Error” 
| summarize count()

The initial RelOp tree for this query looks like this:

And the distributed query looks like this:

Data nodes and shards with relevant data are selected.
Filters and aggregates are pushed down to each shard.
A union of filtered/aggregated shards happen at node level.
Another round of aggregation happens at node level.
Another round of union happens on the aggregated nodes data.
A final aggregation happens on top level.

Basic KQL operators

Now that we have seen how a query is structured and optimized by Azure Synapse Data Explorer Engine, we can start writing some basic KQL.

Most of the KQL queries can be fulfilled by certain common operators listed below:

Operator	Description	Example
where	Filters source data	.. \| where dept == “IT”
extend	Computes and adds a new column to source	.. \| extend col3=col1+col2
count	Counts # of records in a table	.. \| count
project	Choose required columns in the output	.. \| project col1, col2
take	Get random sample records	.. \| take 100
summarize	Perform aggregations based on function provided after this	.. \| summarize avg(col1)
join	Merges rows of two tables based on matching fields	tbl1 \| join tbl2 on id
union	Return rows from 2 or more tables	tbl1 \| union tbl2

Note: The list is not indicative of final list of operators but commonly used ones.

Queries!!!

You can run the following queries using the help cluster which is a read only cluster with already ingested data.

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

print "Hello KQL!!"

Output:

print_0

Hello KQL!!

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

StormEvents
| take 10

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

StormEvents
| where StartTime between (datetime(2007-12-01) .. datetime(2007-12-31))
| count

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

StormEvents
| where StartTime > datetime(2007-01-01) and StartTime < datetime(2007-12-31)
| summarize PropertyDamage_AVG = tolong(avg(DamageProperty)) by EventType

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

StormEvents
| summarize avg(DamageProperty) by EventType
| render piechart

Output:

While this blog introduces KQL query and certain operators, you are encouraged to play around with more complex data and operators by

Creating a Free cluster, ingesting data and querying.
Use the help cluster and start querying – Cluster Uri: https://help.kusto.windows.net/

Stay tuned for deep dive into more KQL functions related to Time series, Geo Spatial and Machine Learning and also custom KQL codes(pre-written) in the upcoming blogs.

Our team publishes blog(s) regularly and you can find all these blogs at https://aka.ms/synapsecseblog . For deeper level of understanding of Synapse implementation best practices, please refer to our Success by Design (SBD) site at https://aka.ms/Synapse-Success-By-Design

Updated Mar 23, 2023

Version 1.0

Synapse CSE

Synapse data explorer

SuryaJ

Microsoft

Joined February 01, 2022

View Profile

Azure Synapse Analytics Blog

Follow this blog board to get notified when there's new activity