Labeling Kusto Data in Azure Managed Grafana for Machine Learning Workflows
In today's data-driven world, the quality of machine learning models heavily depends on the quality of labeled training data. Whether you're detecting anomalies in manufacturing processes, identifying potential health risks, or analyzing weather patterns, having properly labeled datasets is crucial for building reliable AI systems.
This blog post demonstrates how to create an interactive labeling system for Kusto (Azure Data Explorer) data using Azure Managed Grafana, enabling subject matter experts to efficiently label large datasets for machine learning applications.
Why Data Labeling Matters
Labeled data serves as the foundation for supervised machine learning models. Consider these real-world scenarios:
- Healthcare: Medical professionals labeling diagnostic test results to train models that can detect false positives in virus tests or identify anomalies in medical imaging
- Manufacturing: Quality engineers marking defective products in chip production data to build automated quality control systems
- Finance: Fraud analysts labeling suspicious transactions to train fraud detection models
- Weather Monitoring: Meteorologists categorizing storm events to improve weather prediction models
- Automotive: Engineers labeling sensor data from vehicle testing to identify component failures or performance issues
- Retail: Analysts categorizing customer behavior patterns to improve recommendation engines
Without accurate labels, even the most sophisticated algorithms will produce unreliable results.
Architecture Overview
In this solution, large time series datasets stored in an Azure Data Explorer (Kusto) database are enriched with labels. The labeling metadata is maintained in an Azure SQL Database and updated through the Volkov Labs Business Table plugin in Grafana. To provide a unified view, we use external tables in Kusto to seamlessly combine the raw time series data with the labeling information stored in SQL.
In summary, the architecture integrates three key Azure services:
- Azure Data Explorer (Kusto): Stores the large-scale time series data that needs to be labeled.
- Azure SQL Database: Holds the labeling metadata, which can be updated interactively.
- Azure Managed Grafana: Serves as the user interface for labeling, enabling users to view, assign, and update labels directly from the dashboard.
Prerequisites
Before starting, ensure you have:
- An Azure SQL Database
- A Kusto database (Running on Azure or Microsoft Fabric)
- An Azure Managed Grafana workspace
- Appropriate permissions on all services
Demo Dataset: StormEvents
For this demonstration, we'll use the StormEvents table from the Azure Data Explorer help cluster. This publicly available dataset contains detailed information about storm events in the United States from the National Weather Service, making it perfect for demonstrating labeling workflows.
You can explore this dataset by connecting to help.kusto.windows.net and querying:
StormEvents
| take 10
The dataset includes fields like EventType, State, EventNarrative, and DamageProperty, providing rich context for labeling exercises.
Setting Up the SQL Database Schema
The labeling system uses a well-structured SQL schema that separates label types from actual event labels. Below are the key components (you can find the full SQL code in the linked gist):
1. Label Type Table
The LabelType table defines the available label categories with ordering and validation:
CREATE TABLE LabelType (
LabelId INT IDENTITY(1,1) PRIMARY KEY,
LabelShortDesc NVARCHAR(50) NOT NULL UNIQUE,
LabelLongDesc NVARCHAR(500) NOT NULL,
LabelOrder INT NOT NULL DEFAULT 999,
LabelCreatedTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
LabelUpdateTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE()
);
2. Event Labeling Table
The EventLabel table stores actual labels applied to events:
CREATE TABLE EventLabel (
EventLabelId BIGINT IDENTITY(1,1) PRIMARY KEY,
EventId INT NOT NULL,
LabelId INT NOT NULL DEFAULT 1,
EventLabelDesc NVARCHAR(1000) NULL,
EventLabelUserName NVARCHAR(256) NOT NULL DEFAULT SUSER_SNAME(),
EventLabelCreatedTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
EventLabelUpdateTimestamp DATETIME2(3) NOT NULL DEFAULT GETUTCDATE(),
CONSTRAINT UQ_EventLabel_EventId UNIQUE (EventId)
);
3. Upsert Stored Procedure
The most critical component is the sp_AddEventLabel stored procedure that handles label insertion and updates safely. It performs an upsert operation using the SQL MERGE statement, ensuring data consistency without duplicate entries. This approach significantly simplifies the dashboard implementation by abstracting the complexity of label management.
CREATE PROCEDURE sp_AddEventLabel
@EventId BIGINT,
@LabelShortDesc NVARCHAR(50),
@EventLabelDesc NVARCHAR(1000) = NULL,
NVARCHAR(256) = NULL
AS
BEGIN
-- Get LabelId from short description with validation
DECLARE @LabelId INT;
SELECT @LabelId = LabelId FROM LabelType WHERE LabelShortDesc = @LabelShortDesc;
IF @LabelId IS NULL
BEGIN
THROW 50001, 'Invalid label short description provided', 1;
RETURN;
END
-- Use MERGE for true upsert functionality
MERGE EventLabel AS target
USING (SELECT @EventId AS EventId, @LabelId AS LabelId,
@EventLabelDesc AS EventLabelDesc,
ISNULL(@UserName, SUSER_SNAME()) AS EventLabelUserName) AS source
ON target.EventId = source.EventId
WHEN MATCHED THEN
UPDATE SET LabelId = source.LabelId,
EventLabelDesc = source.EventLabelDesc,
EventLabelUserName = source.EventLabelUserName,
EventLabelUpdateTimestamp = GETUTCDATE()
WHEN NOT MATCHED THEN
INSERT (EventId, LabelId, EventLabelDesc, EventLabelUserName)
VALUES (source.EventId, source.LabelId, source.EventLabelDesc, source.EventLabelUserName);
END;
This procedure ensures data integrity and provides automatic timestamping for audit purposes.
4. Consolidated View
The vw_EventLabeling view brings together both tables and introduces calculated fields to streamline querying and analysis. This unified view simplifies access to labeling data, making it easier to build reports and dashboards.
Configuring Authentication and Permissions
Grafana Managed Identity Setup
Azure Managed Grafana uses its workspace name as the name for the name of the managed identity. We use the Grafana Workspace managed identity, accessing the Azure SQL database. Configure the SQL database users and permissions accordingly:
CREATE USER [your-grafana-workspace-name] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-grafana-workspace-name];
ALTER ROLE db_datawriter ADD MEMBER [your-grafana-workspace-name];
GRANT EXECUTE ON SCHEMA::dbo TO [your-grafana-workspace-name];
User Group Access
To enable seamless integration, make sure your user group—already configured for read access in the Kusto database—is also granted read permissions on the SQL tables via external tables. This ensures consistent access across both data sources and simplifies cross-platform querying.:
CREATE USER [your-entra-id-user-group] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your-entra-id-user-group];
Creating External Tables in Kusto
Since Grafana can only connect to one data source per panel, we need external tables in Kusto to access the SQL Database data. This enables us to:
- Make SQL data available in the Kusto database
- Work around Grafana's single data source limitation per panel
// Create external table for the event labeling view
.create external table EventLabeling (
// EventLabel fields
EventLabelId: long,
EventId: long,
EventLabelDesc: string,
// ... other fields
LabelStatus: string
)
kind=sql
table=vw_EventLabeling
(
h@'Server=<your-sql-server>.database.windows.net;Database=<your-database>;Authentication=Active Directory Integrated;'
)
The complete source code definition is available in the linked gist. We use Entra ID integrated authentication to securely connect to the SQL database. To ensure proper access, verify that the user group with read permissions in Kusto also has read access to the SQL database. This alignment is essential for seamless querying through external tables.
Building the Grafana Dashboard with Volkov Labs Business Table
The magic happens in Grafana using the Business Table plugin from Volkov Labs. This plugin provides an editable table interface perfect for data labeling workflows.
The final dashboard
Key Configuration Elements
1. Workspace Data Sources Configuration
Ensure that both the Kusto and SQL database data sources are properly configured in your Grafana workspace. For SQL authentication, the Grafana workspace uses its managed identity, allowing secure and seamless access to the external SQL tables without the need for storing credentials. For Azure Data Explorer we are using the current user authentication method here, passing through the current user in Azure Managed Grafana to the Kusto database (see also my previous blog post here).
2. Panel Configuration
We configure two datasources for the business table
1. events: Querying the StormEvents and joining the labeling data from the SQL database:
let selectedEvents=toscalar(
cluster("help.kusto.windows.net").database("Samples").StormEvents
| where $__timeFilter(StartTime) and EventType in ($v_EventType) and State in ($v_State)
|summarize make_list (EventId));
let EventLabel=external_table('EventLabeling')
| where EventId in (selectedEvents);
cluster("help.kusto.windows.net").database("Samples").StormEvents
| where $__timeFilter(StartTime) and EventType in ($v_EventType) and State in ($v_State)
| lookup EventLabel on EventId
| order by StartTime, EndTime, State asc , EventType asc
In our example, the join to the external SQL table is performed using the EventId column. However, in real-world scenarios, events often lack a single identity column. Instead, a composite natural key—such as StartTime, EndTime, and EventType—may be used to uniquely identify records. To ensure efficient query performance, it's important to verify that filters are pushed down to the SQL database. You can confirm this by reviewing the audit logs. In our case, we explicitly filter on EventIds within the query to achieve pushdown. Note that performing a join (used here as a lookup) does not automatically push the EventIds as filters to the SQL database, which can lead to less efficient execution.
2. LabelType We are using this to populate the drop down list for the label types.
Queries used in the business table
3. Editable Columns Configuration
The Business Table plugin allows specific columns to be editable with different editor types.
With the editor type select, we are confguring the drop down list, based on the LableType-Query:
Example configuration with Editor Type Select
4.Update Query Configuration
Users can add or update labels through the interface. For this we have to configure the Update Request:
Update Request configuration
The upsert operation is executed via the sp_AddEventLabel stored procedure, which receives input from the business table using predefined variables. This is the SQL code:
EXEC sp_AddEventLabel @EventId = '${payload.EventId:int}', @LabelShortDesc = '${payload.LabelShortDesc}',
@EventLabelDesc = '${payload.EventLabelDesc:sqlsting}', ='${__user.login}';
Interactive Labeling Workflow
With everything configured, users can:
- Browse Events: View unlabeled events from the Kusto dataset with context information
- Apply Labels: Use dropdown menus to categorize events (e.g., "False Positive", "Investigation Needed")
- Add Context: Include detailed descriptions for complex cases
- Track Changes: Automatic timestamping and user tracking for full audit trail
- Monitor Progress: View labeling statistics and completion rates
- Collaborate: Multiple users can label different events simultaneously
Best Practices
- Start Simple: Begin with basic label categories and expand as needed
- User Training: Ensure labelers understand the categories and criteria
- Quality Control: Implement review processes for critical labels
- Performance: Index your tables appropriately for large datasets (as shown in our schema)
- Backup: Regular backups of your labeling data are essential
- Validation: Use database constraints to ensure data quality
- Audit Trail: Maintain full history of who labeled what and when
Conclusion
This integrated approach combining Azure SQL Database, Kusto, and Grafana provides a powerful, scalable solution for data labeling workflows. The interactive interface empowers domain experts to efficiently label large datasets, while the robust backend ensures data integrity and traceability.
Key benefits include:
- Unified Interface: Single pane of glass for data exploration and labeling
- Real-time Collaboration: Multiple users can work simultaneously
- Full Audit Trail: Complete history of labeling activities
- Flexible Schema: Easy to adapt for different labeling scenarios
- Enterprise Ready: Built on Azure services with proper authentication
Whether you're building the next generation of medical diagnostic tools, optimizing manufacturing processes, or improving weather prediction models, this labeling system provides the infrastructure needed to turn raw data into actionable insights.
The result is high-quality labeled data that forms a solid foundation for building reliable machine learning models—ultimately driving better business outcomes across a wide range of industries.