Author:@SuryaJ is a Program Manager in the Azure Synapse Customer Success Engineering (CSE) team.
Kusto Query Language (KQL) is a powerful query language to analyse large volumes of structured, semi structured and unstructured (Free Text) data. It has inbuilt operators and functions that lets you analyse data to find trends, patterns, anomalies, create forecasting, and machine learning. Along with Azure Synapse Data Explorer, other Azure native services like Log Analytics, Azure Monitor and App Insights also use KQL to query data. This blog will be an introduction to KQL and its usage in Azure Synapse Data Explorer.
A KQL query is a read only request that takes a tabular input and produces a tabular output. It follows a simple Unix shell script like structure and uses a Top-Down approach for the query structure. Each operator is separated by a ‘|’ (pipe) delimiter. Similar to relational database systems there are operators available like where, joins, union, and more. In addition, there are also functions available for aggregations, geospatial analytics, Time series analytics and some machine learning as well.
This blog covers the following topics:
Motivation for a new query language? – Kusto Query Language (KQL)
Can I write SQL to query data? - SQL to KQL journey
What entities are present in Azure Synapse Data Explorer? What data types are present to write KQL? – Schema and data types
How to write a KQL query? – Understanding basic KQL structure
What happens behind the scenes? – Query evaluation and optimization
How do I get started? – Some basic and commonly used KQL operators
Let’s write some queries!!
Kusto Query Language (KQL)
KQL is a Microsoft homegrown query language that is made open source on GitHub. The language was developed with freedom and scale in mind. The queries are easy to read and adopt, with a lot of terms taken from plain English. For Example, if we want to select sample 10 records from a table called ‘NycTaxi’, the query is simply “ NycTaxi | take 10 “.
Here are certain reasons that showcase the importance of creating KQL.
Start with the Table – In KQL, authoring a query becomes easier as we start with the context of table name first. Because of this, user can concentrate on the logic and not in choosing the right columns, filters and aggregates.
Easy to understand query structure – Not straying too far from relational query languages, a lot of operators, filters and functions are available in KQL except in an easier to understand query structure. More on the query structure and formation detailed below.
Productivity boost – with plethora of powerful inbuilt functions, number of lines of KQL code compared to other languages is very less. Ex: A 4–5-line code in python to create timestamp bins for 1 day is done in KQL using bin() function as bin(timestamp,1d)
IntelliSense at Fingertips - IntelliSense starts helping us right from selecting correct table, columns to filters and aggregates.
SQL to KQL Journey
Azure Synapse Data Explorer also supports a subset of SQL queries. You can simply write a SQL query and execute to display results.
KQL provides an operator called "explain" to translate SQL queries into KQL
While this approach is fine on simple queries and learning KQL, it is recommended to use KQL for Azure Synapse Data Explorer for more complex analytics. SQL to KQL Cheat Sheet is available to help in the journey from SQL to KQL!
Schema and data types
In order to write KQL, let us understand the schema and entities in Azure Synapse Data Explorer.
Similar to relational databases, entities are structured in the following hierarchy.
KQL supports the following datatypes:
While all other data types are standard ones, dynamic is a proprietary data type of Azure Synapse Data Explorer. It helps to traverse through a Json structure and extract any scalar values from arrays or property bags.
In the following example, we see that ‘Trace’ is dynamic column with Json like data. We are able to extract the key-value pair EventType=Demo using the extend operator(explained further below) and dot(.) notation