Azure Data Explorer Blog

5 MIN READ

Performance Optimizations for Applications using Azure Data Explorer

Microsoft

Apr 05, 2022

In many cases Azure Data Explorer’s inbuilt query capabilities can reduce the need to provision traditional memory caching or other data sources which can increase the cost and complexities involved to synchronize and maintain fresh data. Azure Data Explorer can address specific caching requirements, common application querying scenarios and enable fast query response times where applications require fast access to real-time data.

Now let's delve into some of the key concepts:

Data Ingestion:

Azure Data Explorer provides the capability to easily fetch telemetry data from a variety of data sources and run complex analytical queries. Azure Data Explorer supports both batch and streaming ingestion to support near real-time latency requirements. Batch ingestion will have latencies based on the batching policy and query frequency from applications. Streaming ingestion can be leveraged for low latency requirements. Data is cached and indexed for faster query performance and optionally exported out to Azure Data Lake in parquet format for batch processing and integration with other Big Data and Machine Learning (ML) services.

Reference document: Azure Data Explorer data ingestion overview | Microsoft Docs

Data Transformation:

Data transformation in Azure Data Explorer (ADX) can be performed using Update Policy. Update Policy is an extremely useful feature where data transformations can be applied in one place for data ingested from multiple sources. This is also a good design approach as it avoids building transformation logics in multiple ETL (extract, transform, load) jobs or streaming pipelines which may lead to maintenance of logic in many places and data issues - Kusto update policy - Azure Data Explorer | Microsoft Docs

Key considerations:

Hot Cache:

Azure Data Explorer provides best query performance over the data stored in Hot Cache. Based on the requirements, we can set Cache Policy duration to store the data in the processing nodes (RAM or SSD's). Cache Policy can be set at the cluster, database, materialized view or table level - Cache policy (hot and cold cache) - Azure Data Explorer | Microsoft Docs

Retention Policies:

Azure Data Explorer performance can further be enhanced by optimising retention policies. Retention policies deletes old records beyond the retention period. Retention policy can be set at database, materialized view or table level. Consider low retention policies for staging tables - Kusto retention policy controls how data is removed - Azure Data Explorer | Microsoft Docs

Data Enrichment from SQL database:

Generally, application configuration and other static data is maintained in a relational database like Azure SQL which can be used to enrich device telemetry data. Azure Data Explorer provides SQL Plugin capability from which data can be retrieved from a SQL database and joined with the Azure Data Explorer tables - sql_request plugin - Azure Data Explorer | Microsoft Docs

Improve Query Response time for Applications:

There are various techniques to increase the overall throughput and decrease query latency in Azure Data Explorer. Incorporating the techniques described below significantly simplifies the design and maintenance of the application backend data.

Query results cache:

When the same query is run multiple times, this option helps to return the query results faster from the cache. We can set "query_results_cache_max_age" option to specify the time for which cache should be retained. The other option is by setting “set query_results_cache_per_shard” which creates cache on each shard that holds the data.

This feature is useful in cases where the same query is executed multiple times. With this, the query will use the cache and not scan the data each time which can help save compute. Query Results cache can be set using a simple command to set cache maximum age as minutes:

After this any identical queries made to the same database will get the data returned from the results cache. Beyond the timespan the cache is obsolete. Hence, this option is good when you intend to run same query for a limited time - Query results cache - Azure Data Explorer | Microsoft Docs

Materialize Data

materialize ():

This function caches the subquery result during the time of query execution and other subqueries can reference the partial result. The core data is precomputed with expensive joins and aggregations and held as cache - materialize() - Azure Data Explorer | Microsoft Docs

Below are some basic queries to showcase the functionality which can be leveraged for data at scale.

Sample Records in Events table:

Query with multiple tabular expression statements:

Subquery to fetch Data for state New South Wales:

Subquery to fetch data for state Victoria:

Materialized_view():

This is one of the commonly asked for features for analytics use cases where the data needs to be pre-computed and stored for complex queries. In Azure Data Explorer we can create materialized views which are great in terms of performance for applications looking to query aggregated data over a large dataset. This avoids the need to access the base table for recomputation. The application can directly reference the materialized view to get up to date results. It is not required to trigger or refresh the materialized view as the computation happens automatically as the data arrives.

Creating materialized view:

The materialized view can then be accessed using materialized view name. Reference: materialized_view() (scope function) - Azure Data Explorer | Microsoft Docs

Materialized view also solves the issue of duplication which may occur due to malfunctioning devices resending data - Handle duplicate data in Azure Data Explorer | Microsoft Docs

Note: We can further optimise the materialize () query by having low data retention policies for the base table (1 day or lower).

Stored Query Results:

Stored query results is a mechanism that temporarily stores the result of a query which can be referenced in later queries. This feature is useful when applications require pagination over the stored query results and drilldown on the data from the initial query. Stored query results can be created, dropped and accessed for up to 24 hours after creation - Stored query results - Azure Data Explorer | Microsoft Docs

Query to store "stored query" results:

Retrieving stored query results from another query:

Database Cursors:

This feature can be leveraged by applications where there is a data query requirement to have a consistent view of data. The initial query can hold the latest cursor value and then supplied to all the accessing applications. This way the application owner can decide when to provide the latest data and ensure consistency across all applications accessing data.

This feature is extremely useful for instances where the dashboards have a requirement to display the consistent information to all consuming users where some staleness is acceptable - Database cursors - Azure Data Explorer | Microsoft Docs

Let's see how this feature works with some basic queries:

Fetch current cursor value using a query and note the cursor value:

Cursor value: 637825901710610902

Insert some records to Events table:

The query can leverage "cursor_before_or_at" parameter to fetch the records before or at previously noted cursor values:

The query can leverage "cursor_after" parameter to fetch only the latest records after previous previously noted cursor value:

Based on the use case and end user caching requirements of applications, we can implement one or combination of these options.

Consuming from Azure Data Explorer:

Highly Concurrent Applications:

For Applications with high concurrency needs, it is important to incorporate some of the design principles to ensure that user queries are not impacted when queries are submitted simultaneously to Azure data explorer. The document discusses on several optimization approaches that can be leveraged in Azure data explorer to design the backend architecture - Optimize for high concurrency with Azure Data Explorer | Microsoft Docs

Optimize Queries:

Writing efficient queries is key to ensure that queries run in optimal fashion and return faster response - Query best practices - Azure Data Explorer | Microsoft Docs

Consume Data:

Azure Data Explorer API's provides wide variety of options where end users can query data based on requirements to be displayed in the application - Azure Data Explorer API Overview - Azure Data Explorer | Microsoft Docs

Updated Apr 05, 2022

Version 1.0

ashokanms

Microsoft

Joined March 18, 2022

View Profile

Azure Data Explorer Blog

Follow this blog board to get notified when there's new activity

Blog Post

Performance Optimizations for Applications using Azure Data Explorer