Azure Data Explorer services for storing and running interactive analytics Big Data
Published Jul 05 2019 04:14 AM 6,114 Views
Microsoft

So this week I have had a number of questions around course development of Big Data Analysis technologies and number of queries have been specifically around Azure Data Explorer (Kusto), the following blog is a quick overview of Kusto / Azure Data Explorer.

Azure Data Explorer is a big data analytics cloud service optimized for interactive ad-hoc queries over structured, semi-structured, and unstructured data. Kusto is the internal code name of the project in Microsoft. Externally, the cloud service is called Azure Data Explorer.

 

Kusto is a log analytics cloud platform optimized for ad-hoc big data queries. You can read more about Kusto here: https://docs.microsoft.com/en-us/azure/kusto/

 

The world of Big Data is growing steadily, and the number of technologies that process large amounts of data is growing along with it. So how does Kusto compare to other tools such as Cosmos, MDM and Hadoop. So firstly lets consider the three telemetry-processing scenarios, based primarily on latency needs:

  1. Hot path
  2. Warm path
  3. Cold path

For example MDM,  traditional TSDBs, and many stream processing technologies such as Azure Stream Analytics are considered as "hot path" technologies.

Kusto targets the "warm path" scenario


Various batch processing systems (such as Cosmos, Hadoop, and Azure Data Lake Compute) are "cold path".

The following table attempts to highlight some of the differences.

Aspect Hot path Warm path Cold path
Latency Seconds (up to, say, 5) Minutes (up to, say, five) More
Queryable data storage RAM Attached (low latency) SSD HDD (Cosmos, Hadoop) or even remote storage (HDInsight)
Query frequency Automated (alerting) Ad-hoc (human-generated) Occasional
Max size of intermediate result Single-node RAM Cluster RAM "Infinite" (spilled to HDD)
Recovery from query failures No No Yes (built for batch processing; continue from last checkpoint)
Data analysis Metrics (TSDB-like) Text and numbers Everything you can write a C# function for
Data form Aggregated Raw Raw
Targeted for Real time data viewing Ad-hoc data exploration Programmatic data manipulation

Kusto is built for analytics, rather than OLTP, scenarios. Therefore, it design trade-offs favor very fast bulk Create (supporting high rates of inserts/appends of new records) and very fast bulk Read (supporting queries over large amounts of data). Kusto's support for Delete scenarios focuses on bulk-delete (mainly for retention period), and per-record deletion is not supported. Likewise, Updates of existing records is not supported in Kusto.

Kusto offers excellent data ingestion and query performance by "sacrificing" the ability to perform in-place updates of individual rows and cross-table constraints/transactions. Therefore, it supplants, rather than replaces, traditional RDBMS systems for scenarios such as OLTP and data warehousing.

 

As a Big Data service, Kusto handles structured, semi-structured e.g. JSON-like nested type

 

Introductory videos

Azure Data Explorer was first announced in Ignite 2018

  1. Scott Guthrie's announcement in Orlando: https://www.youtube.com/watch?v=xnmBu4oh7xk&t=1h08m12s
  2. Rohan Kumar's announcement: https://www.youtube.com/watch?v=ZaiM89Z01r0&t=58m0s ]
  3. Manoj Raheja's brief introduction to Kusto: https://www.youtube.com/watch?v=GT4C84yrb68
  4. Scott Guthrie demoing Kusto in Techorama: 
    https://www.youtube.com/watch?v=YTWewM_UMOk&feature=youtu.be&t=3074

Kusto is used as the data platform for a number of Microsoft services, some of which expose its query language to users. Here are two videos showing its capabilities when used inside Application Insights / Azure Monitor:

  1. Interactive Analytics with Application Insights
  2. Advanced Analytics with Application Insights

Product links:

Social:

Version history
Last update:
‎Jul 05 2019 04:23 AM
Updated by: