Azure Data Explorer services for storing and running interactive analytics Big Data

Microsoft

Jul 05, 2019

So this week I have had a number of questions around course development of Big Data Analysis technologies and number of queries have been specifically around Azure Data Explorer (Kusto), the following blog is a quick overview of Kusto / Azure Data Explorer.

Azure Data Explorer is a big data analytics cloud service optimized for interactive ad-hoc queries over structured, semi-structured, and unstructured data. Kusto is the internal code name of the project in Microsoft. Externally, the cloud service is called Azure Data Explorer.

Kusto is a log analytics cloud platform optimized for ad-hoc big data queries. You can read more about Kusto here: https://docs.microsoft.com/en-us/azure/kusto/

The world of Big Data is growing steadily, and the number of technologies that process large amounts of data is growing along with it. So how does Kusto compare to other tools such as Cosmos, MDM and Hadoop. So firstly lets consider the three telemetry-processing scenarios, based primarily on latency needs:

Hot path
Warm path
Cold path

For example MDM, traditional TSDBs, and many stream processing technologies such as Azure Stream Analytics are considered as "hot path" technologies.

Kusto targets the "warm path" scenario

Various batch processing systems (such as Cosmos, Hadoop, and Azure Data Lake Compute) are "cold path".

The following table attempts to highlight some of the differences.

Aspect	Hot path	Warm path	Cold path
Latency	Seconds (up to, say, 5)	Minutes (up to, say, five)	More
Queryable data storage	RAM	Attached (low latency) SSD	HDD (Cosmos, Hadoop) or even remote storage (HDInsight)
Query frequency	Automated (alerting)	Ad-hoc (human-generated)	Occasional
Max size of intermediate result	Single-node RAM	Cluster RAM	"Infinite" (spilled to HDD)
Recovery from query failures	No	No	Yes (built for batch processing; continue from last checkpoint)
Data analysis	Metrics (TSDB-like)	Text and numbers	Everything you can write a C# function for
Data form	Aggregated	Raw	Raw
Targeted for	Real time data viewing	Ad-hoc data exploration	Programmatic data manipulation

Kusto is built for analytics, rather than OLTP, scenarios. Therefore, it design trade-offs favor very fast bulk Create (supporting high rates of inserts/appends of new records) and very fast bulk Read (supporting queries over large amounts of data). Kusto's support for Delete scenarios focuses on bulk-delete (mainly for retention period), and per-record deletion is not supported. Likewise, Updates of existing records is not supported in Kusto.

Kusto offers excellent data ingestion and query performance by "sacrificing" the ability to perform in-place updates of individual rows and cross-table constraints/transactions. Therefore, it supplants, rather than replaces, traditional RDBMS systems for scenarios such as OLTP and data warehousing.

As a Big Data service, Kusto handles structured, semi-structured e.g. JSON-like nested type

Introductory videos

Azure Data Explorer was first announced in Ignite 2018

Scott Guthrie's announcement in Orlando: https://www.youtube.com/watch?v=xnmBu4oh7xk&t=1h08m12s
Rohan Kumar's announcement: https://www.youtube.com/watch?v=ZaiM89Z01r0&t=58m0s ]
Manoj Raheja's brief introduction to Kusto: https://www.youtube.com/watch?v=GT4C84yrb68
Scott Guthrie demoing Kusto in Techorama:
https://www.youtube.com/watch?v=YTWewM_UMOk&feature=youtu.be&t=3074

Kusto is used as the data platform for a number of Microsoft services, some of which expose its query language to users. Here are two videos showing its capabilities when used inside Application Insights / Azure Monitor:

Product links: