Monitor batching ingestion with ADX Insights

Microsoft

Aug 25, 2021

Azure Data Explorer Insights (ADX Insights) provides a unified view of your clusters' usage, performance, and health. Now, you can use the new "Ingestion" tab to monitor the status of batching ingestion operations to ADX.

In this post, you will learn how to use ADX Insights to monitor batching ingestion.

Here are some questions you can get answers to with ADX Insights:

What is the result of my ingestion attempts? How many ingestions have succeeded or failed? (by database or table granularity)
Are there any tables that may be missing data due to ingestion errors? What exactly are the error details?
What was the amount of data processed by the ingestion pipeline?
What is the latency of the ingestion process? Did the latency build up in ADX's pipeline or upstream of ADX?
How can I better understand how batches are generated during ingestion?
For ingestion using Event Hub, Event Grid, or IoT Hub, how can I compare the number of events arriving at ADX with the number of events sent for ingestion?

A bird's-eye view of the ingestion results

On the Azure portal, go to the ADX cluster page > "Insights" blade ()> "Ingestion" tab.

On the top of the screen is a "traffic light" representing the number of failed and successful ingestion operations. Other indicators are the overall ingestion latency and the ingestion utilization.

The number of failed and successful ingestions is the number of blobs that were ingested or failed to be ingested. (The ingestion process is performed in blobs. Event Hub and IoT Hub ingestion events are aggregated into a single blob (multiple events per blob) and then processed as a single blob (source) blob for ingestion)

Succeeded ingestions - "per-table" monitoring

Failed ingestions - "per-table" monitoring

Click on the "Successful" or "Failures" tabs to drill down and see more details per database and table, including:

The number of successful ingestions per table, including the ingestion success rate.
The number of failed ingestions for each table, along with the status (permanent or transient), error code, and sample error text.
You can use the icon to dig deeper into the log and view more details, for example, a list of other error texts associated with a certain error code.
A time chart showing successful and failed ingestions over time.

Table level monitoring is based on diagnostic logs. To see table-level details ,make sure to enable the ingestion diagnostic logs, according to your monitoring needs, and send them to Log-Analytics.

"SucceededIngestion" log: These logs have information about successfully completed ingestion operations.
"FailedIngestion" log: These logs have detailed information about failed ingestion operations including error details.

On the other two tabs, you will find information about:

The "Total latency" (accumulative) - the time from the point at which ADX accepts the data until it is available for query.
"Ingestion utilization" - percentage of actual resources used to ingest data from the total resources allocated, in the capacity policy, to perform ingestion.
Total latency by database

Visibility into the ingestion process - understand the batching stages

In the batching ingestion process, Azure Data Explorer optimizes data ingestion for high throughput by batching incoming small chunks of data into batches based on a configurable ingestion batching policy. The batching policy allows you to set the trigger conditions for sealing a batch to be ingested (the conditions are: data size, number of blobs, or time passed. More possible conditions that can’t be configured in the batching policy, can be found here: batching types). These batches are then optimally ingested for fast query results.

Batching ingestion stages

There are four stages to batching ingestion, and there are specific components for each step:

Data Connection - For Event Grid, Event Hub and IoT Hub ingestion, there is a Data Connection that gets the data from external sources and performs initial data rearrangement.
The Batching Manager batches the received references to data chunks to optimize ingestion throughput based on a batching policy.
The Ingestion Manager sends the ingestion command to the ADX's Storage Engine.
The ADX's Storage Engine stores the ingested data, making it available for query.

Example:

You can monitor your data connections (per event hub or IoT hub) and track the "received data size" by each data connection.
You can also monitor the "discovery latency" – this is the time frame from data enqueue until data is discovered by ADX. This time frame is upstream to Azure Data Explorer. Discovery latency is available only for data connections (Event Hub, IoT Hub, or Event Grid ingestion) and it measures the time until data is discovered by the data connection.

When you see a long latency until data is ready for query, analyzing the discovery Latency and the next stage latencies (in the next steps) can help you understand whether the long latency is because of long latency in ADX, or is upstream to ADX.

This is how the latency is built up over the components:

When applying Event Hub, IoT Hub, or Event Grid ingestion, it can be useful to compare the number of events arriving at ADX data connection with the number of events sent for the next steps of the ingestion process (in other words, they were processed successfully by the data connection stage). The tiles Events Received, Events Processed, and Events Dropped allow you to make this comparison.

Data connection monitoring

The second component of the batching ingestion proceeds is the Batching Manager, which optimizes ingestion throughput by batching data based on the ingestion batching policy.

This step allows you to monitor aspects such as:

Batch seal reason - the types of reasons (triggers) that sealed the batch (a batch is sealed for ingestion when the first condition is met.). The full list of possible reasons can be found here.
Batching duration - the duration of a batch from the moment it is opened to when it is sealed,
Batch size - uncompressed expected data size in a batch for ingestion.

Batching monitoring

Moreover, you can view "per-table" data: batching duration per table, the batching size per table, and how the batches were sealed per table (as determined by the ingestion batching policy details.)

Batching monitoring - per DB or per table

The 3ed and 4th components are Ingestion Manager and Storage Engine, respectively. In the Storage Engine, you can see the accumulative latency per database - the time from the moment ADX accepts the data until the data is received by the Storage Engine, and it is available for query.

The "Amount of data processed" tile shows the number of blobs received, blobs processed (== successfully), and blobs dropped. Blobs that have been processed by the Storage Engine are ready for query.

Storage Engine monitoring

No need to learn by heart

Each definition described here can be found throughout the experience!
The definitions of the batching steps are hidden by default, but can be shown by using the "Show help" toggle.

In-product help and definitions

Feel free to comment on this blog post. You can also use the feedback button () on the top of the "Insights" page.

More information about the metrics of the batching ingestion can be found here.
More information about other tabs of ADX Insights can be found here.