Azure Data Explorer Blog

8 MIN READ

Labeling Kusto Data in Azure Managed Grafana for Machine Learning Workflows

Hauke

Microsoft

Jul 14, 2025

Labeling Kusto Data in Azure Managed Grafana for Machine Learning Workflows

In today's data-driven world, the quality of machine learning models heavily depends on the quality of labeled training data. Whether you're detecting anomalies in manufacturing processes, identifying potential health risks, or analyzing weather patterns, having properly labeled datasets is crucial for building reliable AI systems.

This blog post demonstrates how to create an interactive labeling system for Kusto (Azure Data Explorer) data using Azure Managed Grafana, enabling subject matter experts to efficiently label large datasets for machine learning applications.

Why Data Labeling Matters

Labeled data serves as the foundation for supervised machine learning models. Consider these real-world scenarios:

Healthcare: Medical professionals labeling diagnostic test results to train models that can detect false positives in virus tests or identify anomalies in medical imaging
Manufacturing: Quality engineers marking defective products in chip production data to build automated quality control systems
Finance: Fraud analysts labeling suspicious transactions to train fraud detection models
Weather Monitoring: Meteorologists categorizing storm events to improve weather prediction models
Automotive: Engineers labeling sensor data from vehicle testing to identify component failures or performance issues
Retail: Analysts categorizing customer behavior patterns to improve recommendation engines

Without accurate labels, even the most sophisticated algorithms will produce unreliable results.

Architecture Overview

In this solution, large time series datasets stored in an Azure Data Explorer (Kusto) database are enriched with labels. The labeling metadata is maintained in an Azure SQL Database and updated through the Volkov Labs Business Table plugin in Grafana. To provide a unified view, we use external tables in Kusto to seamlessly combine the raw time series data with the labeling information stored in SQL.

In summary, the architecture integrates three key Azure services:

Azure Data Explorer (Kusto): Stores the large-scale time series data that needs to be labeled.
Azure SQL Database: Holds the labeling metadata, which can be updated interactively.
Azure Managed Grafana: Serves as the user interface for labeling, enabling users to view, assign, and update labels directly from the dashboard.

High-Level-Architecture

Prerequisites

Before starting, ensure you have:

An Azure SQL Database
A Kusto database (Running on Azure or Microsoft Fabric)
An Azure Managed Grafana workspace
Appropriate permissions on all services

Demo Dataset: StormEvents

For this demonstration, we'll use the StormEvents table from the Azure Data Explorer help cluster. This publicly available dataset contains detailed information about storm events in the United States from the National Weather Service, making it perfect for demonstrating labeling workflows.

You can explore this dataset by connecting to help.kusto.windows.net and querying:

StormEvents 
| take 10

The dataset includes fields like EventType, State, EventNarrative, and DamageProperty, providing rich context for labeling exercises.

Setting Up the SQL Database Schema

The labeling system uses a well-structured SQL schema that separates label types from actual event labels. Below are the key components (you can find the full SQL code in the linked gist):

1. Label Type Table

The LabelType table defines the available label categories with ordering and validation:

CREATE TABLE LabelType (
    LabelId INT IDENTITY(1,1) PRIMARY KEY,
    LabelShortDesc NVARCHAR(50) NOT NULL UNIQUE,
    LabelLongDesc NVARCHAR(500) NOT NULL,
    LabelOrder INT NOT NULL DEFAULT 999,
    LabelCreatedTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
    LabelUpdateTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE()
);

2. Event Labeling Table

The EventLabel table stores actual labels applied to events:

CREATE TABLE EventLabel (
    EventLabelId BIGINT IDENTITY(1,1) PRIMARY KEY,
    EventId INT NOT NULL,
    LabelId INT NOT NULL DEFAULT 1,
    EventLabelDesc NVARCHAR(1000) NULL,
    EventLabelUserName NVARCHAR(256) NOT NULL DEFAULT SUSER_SNAME(),
    EventLabelCreatedTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
    EventLabelUpdateTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
    CONSTRAINT UQ_EventLabel_EventId UNIQUE (EventId)
);

3. Upsert Stored Procedure

The most critical component is the sp_AddEventLabel stored procedure that handles label insertion and updates safely. It performs an upsert operation using the SQL MERGE statement, ensuring data consistency without duplicate entries. This approach significantly simplifies the dashboard implementation by abstracting the complexity of label management.

CREATE PROCEDURE sp_AddEventLabel
    @EventId BIGINT,
    @LabelShortDesc NVARCHAR(50),
    @EventLabelDesc NVARCHAR(1000) = NULL,
     NVARCHAR(256) = NULL
AS
BEGIN
    -- Get LabelId from short description with validation
    DECLARE @LabelId INT;
    SELECT @LabelId = LabelId FROM LabelType WHERE LabelShortDesc = @LabelShortDesc;
    
    IF @LabelId IS NULL
    BEGIN
        THROW 50001, 'Invalid label short description provided', 1;
        RETURN;
    END
    
    -- Use MERGE for true upsert functionality
    MERGE EventLabel AS target
    USING (SELECT @EventId AS EventId, @LabelId AS LabelId, 
                  @EventLabelDesc AS EventLabelDesc, 
                  ISNULL(@UserName, SUSER_SNAME()) AS EventLabelUserName) AS source
    ON target.EventId = source.EventId
    WHEN MATCHED THEN
        UPDATE SET LabelId = source.LabelId, 
                   EventLabelDesc = source.EventLabelDesc,
                   EventLabelUserName = source.EventLabelUserName,
                   EventLabelUpdateTimestamp = GETUTCDATE()
    WHEN NOT MATCHED THEN
        INSERT (EventId, LabelId, EventLabelDesc, EventLabelUserName)
        VALUES (source.EventId, source.LabelId, source.EventLabelDesc, source.EventLabelUserName);
END;

This procedure ensures data integrity and provides automatic timestamping for audit purposes.

4. Consolidated View

The vw_EventLabeling view brings together both tables and introduces calculated fields to streamline querying and analysis. This unified view simplifies access to labeling data, making it easier to build reports and dashboards.

Configuring Authentication and Permissions

Grafana Managed Identity Setup

Azure Managed Grafana uses its workspace name as the name for the name of the managed identity. We use the Grafana Workspace managed identity, accessing the Azure SQL database. Configure the SQL database users and permissions accordingly:

CREATE USER [your-grafana-workspace-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-grafana-workspace-name];
ALTER ROLE db_datawriter ADD MEMBER [your-grafana-workspace-name];
GRANT EXECUTE ON SCHEMA::dbo TO [your-grafana-workspace-name];

User Group Access

To enable seamless integration, make sure your user group—already configured for read access in the Kusto database—is also granted read permissions on the SQL tables via external tables. This ensures consistent access across both data sources and simplifies cross-platform querying.:

CREATE USER [your-entra-id-user-group] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-entra-id-user-group];

Creating External Tables in Kusto

Since Grafana can only connect to one data source per panel, we need external tables in Kusto to access the SQL Database data. This enables us to:

Make SQL data available in the Kusto database
Work around Grafana's single data source limitation per panel

// Create external table for the event labeling view
.create external table EventLabeling (
    // EventLabel fields
    EventLabelId: long,
    EventId: long,
    EventLabelDesc: string,
    // ... other fields
    LabelStatus: string
)
kind=sql
table=vw_EventLabeling
(
    h@'Server=<your-sql-server>.database.windows.net;Database=<your-database>;Authentication=Active Directory Integrated;'
)

The complete source code definition is available in the linked gist. We use Entra ID integrated authentication to securely connect to the SQL database. To ensure proper access, verify that the user group with read permissions in Kusto also has read access to the SQL database. This alignment is essential for seamless querying through external tables.

Building the Grafana Dashboard with Volkov Labs Business Table

The magic happens in Grafana using the Business Table plugin from Volkov Labs. This plugin provides an editable table interface perfect for data labeling workflows.

The final dashboard

Key Configuration Elements

1. Workspace Data Sources Configuration

Ensure that both the Kusto and SQL database data sources are properly configured in your Grafana workspace. For SQL authentication, the Grafana workspace uses its managed identity, allowing secure and seamless access to the external SQL tables without the need for storing credentials. For Azure Data Explorer we are using the current user authentication method here, passing through the current user in Azure Managed Grafana to the Kusto database (see also my previous blog post here).

2. Panel Configuration

We configure two datasources for the business table

1. events: Querying the StormEvents and joining the labeling data from the SQL database:

let selectedEvents=toscalar(
cluster("help.kusto.windows.net").database("Samples").StormEvents
| where $__timeFilter(StartTime) and EventType in ($v_EventType) and State in ($v_State)
|summarize make_list (EventId));
let EventLabel=external_table('EventLabeling')
| where EventId in (selectedEvents);
cluster("help.kusto.windows.net").database("Samples").StormEvents
| where $__timeFilter(StartTime) and EventType in ($v_EventType) and State in ($v_State)
| lookup EventLabel on EventId
| order by StartTime, EndTime, State asc , EventType asc

In our example, the join to the external SQL table is performed using the EventId column. However, in real-world scenarios, events often lack a single identity column. Instead, a composite natural key—such as StartTime, EndTime, and EventType—may be used to uniquely identify records. To ensure efficient query performance, it's important to verify that filters are pushed down to the SQL database. You can confirm this by reviewing the audit logs. In our case, we explicitly filter on EventIds within the query to achieve pushdown. Note that performing a join (used here as a lookup) does not automatically push the EventIds as filters to the SQL database, which can lead to less efficient execution.

2. LabelType We are using this to populate the drop down list for the label types.

Queries used in the business table

3. Editable Columns Configuration

The Business Table plugin allows specific columns to be editable with different editor types.

With the editor type select, we are confguring the drop down list, based on the LableType-Query:

Example configuration with Editor Type Select

4.Update Query Configuration

Users can add or update labels through the interface. For this we have to configure the Update Request:

Update Request configuration

The upsert operation is executed via the sp_AddEventLabel stored procedure, which receives input from the business table using predefined variables. This is the SQL code:

EXEC sp_AddEventLabel @EventId = '${payload.EventId:int}', @LabelShortDesc = '${payload.LabelShortDesc}', 
     @EventLabelDesc = '${payload.EventLabelDesc:sqlsting}', ='${__user.login}';

Interactive Labeling Workflow

With everything configured, users can:

Browse Events: View unlabeled events from the Kusto dataset with context information
Apply Labels: Use dropdown menus to categorize events (e.g., "False Positive", "Investigation Needed")
Add Context: Include detailed descriptions for complex cases
Track Changes: Automatic timestamping and user tracking for full audit trail
Monitor Progress: View labeling statistics and completion rates
Collaborate: Multiple users can label different events simultaneously

Best Practices

Start Simple: Begin with basic label categories and expand as needed
User Training: Ensure labelers understand the categories and criteria
Quality Control: Implement review processes for critical labels
Performance: Index your tables appropriately for large datasets (as shown in our schema)
Backup: Regular backups of your labeling data are essential
Validation: Use database constraints to ensure data quality
Audit Trail: Maintain full history of who labeled what and when

Conclusion

This integrated approach combining Azure SQL Database, Kusto, and Grafana provides a powerful, scalable solution for data labeling workflows. The interactive interface empowers domain experts to efficiently label large datasets, while the robust backend ensures data integrity and traceability.

Key benefits include:

Unified Interface: Single pane of glass for data exploration and labeling
Real-time Collaboration: Multiple users can work simultaneously
Full Audit Trail: Complete history of labeling activities
Flexible Schema: Easy to adapt for different labeling scenarios
Enterprise Ready: Built on Azure services with proper authentication

Whether you're building the next generation of medical diagnostic tools, optimizing manufacturing processes, or improving weather prediction models, this labeling system provides the infrastructure needed to turn raw data into actionable insights.

The result is high-quality labeled data that forms a solid foundation for building reliable machine learning models—ultimately driving better business outcomes across a wide range of industries.