So this week I have had a number of questions around course development of Big Data Analysis technologies and number of queries have been specifically around Azure Data Explorer (Kusto), the following blog is a quick overview of Kusto / Azure Data Explorer.
Azure Data Explorer is a big data analytics cloud service optimized for interactive ad-hoc queries over structured, semi-structured, and unstructured data. Kusto is the internal code name of the project in Microsoft. Externally, the cloud service is called Azure Data Explorer.
Kusto is a log analytics cloud platform optimized for ad-hoc big data queries. You can read more about Kusto here: https://docs.microsoft.com/en-us/azure/kusto/
The world of Big Data is growing steadily, and the number of technologies that process large amounts of data is growing along with it. So how does Kusto compare to other tools such as Cosmos, MDM and Hadoop. So firstly lets consider the three telemetry-processing scenarios, based primarily on latency needs:
For example MDM, traditional TSDBs, and many stream processing technologies such as Azure Stream Analytics are considered as "hot path" technologies.
Kusto targets the "warm path" scenario
Various batch processing systems (such as Cosmos, Hadoop, and Azure Data Lake Compute) are "cold path".
The following table attempts to highlight some of the differences.
Aspect | Hot path | Warm path | Cold path |
---|---|---|---|
Latency | Seconds (up to, say, 5) | Minutes (up to, say, five) | More |
Queryable data storage | RAM | Attached (low latency) SSD | HDD (Cosmos, Hadoop) or even remote storage (HDInsight) |
Query frequency | Automated (alerting) | Ad-hoc (human-generated) | Occasional |
Max size of intermediate result | Single-node RAM | Cluster RAM | "Infinite" (spilled to HDD) |
Recovery from query failures | No | No | Yes (built for batch processing; continue from last checkpoint) |
Data analysis | Metrics (TSDB-like) | Text and numbers | Everything you can write a C# function for |
Data form | Aggregated | Raw | Raw |
Targeted for | Real time data viewing | Ad-hoc data exploration | Programmatic data manipulation |
Kusto is built for analytics, rather than OLTP, scenarios. Therefore, it design trade-offs favor very fast bulk Create (supporting high rates of inserts/appends of new records) and very fast bulk Read (supporting queries over large amounts of data). Kusto's support for Delete scenarios focuses on bulk-delete (mainly for retention period), and per-record deletion is not supported. Likewise, Updates of existing records is not supported in Kusto.
Kusto offers excellent data ingestion and query performance by "sacrificing" the ability to perform in-place updates of individual rows and cross-table constraints/transactions. Therefore, it supplants, rather than replaces, traditional RDBMS systems for scenarios such as OLTP and data warehousing.
As a Big Data service, Kusto handles structured, semi-structured e.g. JSON-like nested type
Azure Data Explorer was first announced in Ignite 2018
Kusto is used as the data platform for a number of Microsoft services, some of which expose its query language to users. Here are two videos showing its capabilities when used inside Application Insights / Azure Monitor:
Product links:
Social:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.