Introduction to Kusto Query Language (KQL)
Published Mar 23 2023 09:00 AM 16.1K Views
Microsoft

SuryaJ_0-1677838031187.png

Author: @SuryaJ  is a Program Manager in the Azure Synapse Customer Success Engineering (CSE) team.

 

Introduction

Kusto Query Language (KQL) is a powerful query language to analyse large volumes of structured, semi structured and unstructured (Free Text) data. It has inbuilt operators and functions that lets you analyse data to find trends, patterns, anomalies, create forecasting, and machine learning. Along with Azure Synapse Data Explorer, other Azure native services like Log Analytics, Azure Monitor and App Insights also use KQL to query data. This blog will be an introduction to KQL and its usage in Azure Synapse Data Explorer.

 

A KQL query is a read only request that takes a tabular input and produces a tabular output. It follows a simple Unix shell script like structure and uses a Top-Down approach for the query structure. Each operator is separated by a ‘|’ (pipe) delimiter. Similar to relational database systems there are operators available like where, joins, union, and more. In addition, there are also functions available for aggregations, geospatial analytics, Time series analytics and some machine learning as well.

 

This blog covers the following topics:

  • Motivation for a new query language? – Kusto Query Language (KQL)
  • Can I write SQL to query data? - SQL to KQL journey
  • What entities are present in Azure Synapse Data Explorer? What data types are present to write KQL? – Schema and data types
  • How to write a KQL query? – Understanding basic KQL structure
  • What happens behind the scenes? – Query evaluation and optimization
  • How do I get started? – Some basic and commonly used KQL operators
  • Let’s write some queries!!

 

Kusto Query Language (KQL)

KQL is a Microsoft homegrown query language that is made open source on GitHub. The language was developed with freedom and scale in mind. The queries are easy to read and adopt, with a lot of terms taken from plain English. For Example, if we want to select sample 10 records from a table called ‘NycTaxi’, the query is simply “ NycTaxi | take 10 “.

 

Here are certain reasons that showcase the importance of creating KQL.

  • Start with the Table – In KQL, authoring a query becomes easier as we start with the context of table name first. Because of this, user can concentrate on the logic and not in choosing the right columns, filters and aggregates.
  • Easy to understand query structure – Not straying too far from relational query languages, a lot of operators, filters and functions are available in KQL except in an easier to understand query structure. More on the query structure and formation detailed below.
  • Productivity boost – with plethora of powerful inbuilt functions, number of lines of KQL code compared to other languages is very less. Ex: A 4–5-line code in python to create timestamp bins for 1 day is done in KQL using bin() function as bin(timestamp,1d)
  • IntelliSense at Fingertips -   IntelliSense starts helping us right from selecting correct table, columns to filters and aggregates.

Intellisense.gifSQL to KQL Journey

Azure Synapse Data Explorer also supports a subset of SQL queries. You can simply write a SQL query and execute to display results.

SQL.gif

 KQL provides an operator called "explain" to translate SQL queries into KQL

SQLtoKQL.gif

While this approach is fine on simple queries and learning KQL, it is recommended to use KQL for Azure Synapse Data Explorer for more complex analytics. SQL to KQL Cheat Sheet is available to help in the journey from SQL to KQL!

 

Schema and data types

In order to write KQL, let us understand the schema and entities in Azure Synapse Data Explorer.

Similar to relational databases, entities are structured in the following hierarchy.

 

SuryaJ_3-1677837524542.png

KQL supports the following datatypes:

  • Boolean
  • Integer
  • Real
  • Decimal
  • Dates
  • Timespan
  • String
  • Guid
  • Dynamic (JSON)

While all other data types are standard ones, dynamic is a proprietary data type of Azure Synapse Data Explorer. It helps to traverse through a Json structure and extract any scalar values from arrays or property bags.

 

In the following example, we see that ‘Trace’ is dynamic column with Json like data. We are able to extract the key-value pair EventType=Demo using the extend operator(explained further below) and dot(.) notation

dynamic.gifMore on the dynamic data type here : The dynamic data type - Azure Data Explorer | Microsoft Learn

 

Understand the basic KQL query structure!

Let’s take an example KQL query and break it down!

 

 

 

𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬
| w𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘳𝘵𝘛𝘪𝘮𝘦 𝙗𝙚𝙩𝙬𝙚𝙚𝙣 (𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟷-𝟶𝟷) .. 𝚍𝚊𝚝𝚎𝚝𝚒𝚖𝚎(𝟸𝟶𝟶𝟽-𝟷𝟸-𝟶𝟷))
| 𝘄𝗵𝗲𝗿𝗲 𝘚𝘵𝘢𝘵𝘦 == "𝙵𝙻𝙾𝚁𝙸𝙳𝙰" 
| 𝗰𝗼𝘂𝗻𝘁

 

 

 

  1. 𝐒𝐭𝐨𝐫𝐦𝐄𝐯𝐞𝐧𝐭𝐬 denotes the table we want to query. This single line is equivalent to ‘select * from StormEvents’ in SQL
  2. Line 2 and 3 are applied to funnel/filter the data from StormEvents table.
    1. Line 2 filters ‘StartTime’ datetime column to be between 01 Nov 2007 and 01 Dec 2007
    2. Line 3 filters ‘State’ column to be Florida
  3. Finally apply the count() aggregation on the filtered data.

SuryaJ_5-1677837548714.png

 

Query evaluation and optimization

Once a KQL query is received to Azure Synapse Data Explorer Engine, it passes the query through certain stages to make it an optimized and distributed query.

 

SuryaJ_0-1677840409699.png

Let’s take an example query and see how it is transformed passing through above stages.

 

 

Logs 
| where Type== “Error” 
| summarize count()

 

 

 

The initial RelOp tree for this query looks like this:

SuryaJ_1-1677841218751.png

 

And the distributed query looks like this:

SuryaJ_2-1677841248580.png

 

  • Data nodes and shards with relevant data are selected.
  • Filters and aggregates are pushed down to each shard.
  • A union of filtered/aggregated shards happen at node level.
  • Another round of aggregation happens at node level.
  • Another round of union happens on the aggregated nodes data.
  • A final aggregation happens on top level.

 

Basic KQL operators

Now that we have seen how a query is structured and optimized by Azure Synapse Data Explorer Engine, we can start writing some basic KQL.

 

Most of the KQL queries can be fulfilled by certain common operators listed below:

Operator

Description

Example

where

Filters source data

.. | where dept == “IT”

extend

Computes and adds a new column to source

.. | extend col3=col1+col2

count

Counts # of records in a table

.. | count

project

Choose required columns in the output

.. | project col1, col2

take

Get random sample records

.. | take 100

summarize

Perform aggregations based on function provided after this

.. | summarize avg(col1)

join

Merges rows of two tables based on matching fields

tbl1 | join tbl2 on id

union

Return rows from 2 or more tables

tbl1 | union tbl2

Note: The list is not indicative of final list of operators but commonly used ones.

 

Queries!!!

You can run the following queries using the help cluster which is a read only cluster with already ingested data.

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

 

 

print "Hello KQL!!"

 

 

 

Output:

print_0

Hello KQL!!

 

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

 

 

StormEvents
| take 10

 

 

 

 

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

 

 

StormEvents
| where StartTime between (datetime(2007-12-01) .. datetime(2007-12-31))
| count

 

 

 

 

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

 

 

StormEvents
| where StartTime > datetime(2007-01-01) and StartTime < datetime(2007-12-31)
| summarize PropertyDamage_AVG = tolong(avg(DamageProperty)) by EventType

 

 

 

 

Execute in [Web] [Desktop] [cluster('help.kusto.windows.net').database('Samples')]

 

 

StormEvents
| summarize avg(DamageProperty) by EventType
| render piechart  

 

 

 

Output:

SuryaJ_9-1677837548724.png

While this blog introduces KQL query and certain operators, you are encouraged to play around with more complex data and operators by

 

Stay tuned for deep dive into more KQL functions related to Time series, Geo Spatial and Machine Learning and also custom KQL codes(pre-written) in the upcoming blogs.

 

Our team publishes blog(s) regularly and you can find all these blogs at https://aka.ms/synapsecseblog. For deeper level of understanding of Synapse implementation best practices, please refer to our Success by Design (SBD) site at https://aka.ms/Synapse-Success-By-Design 

 

3 Comments
Co-Authors
Version history
Last update:
‎Mar 23 2023 08:20 AM
Updated by: