Forum Discussion
Kusto: summarize consecutive segments of the same value per batch
Problem statement (Azure Data Explorer / Kusto)
I have a table in Azure Data Explorer (Kusto) with the following columns:
- batch_number (string)
- team (string)
- datetime (datetime)
Each row represents work done by a team on a batch at a given time.
The data is chronological per batch, but teams can change over time and may come back later.
Example data (ordered by batch_number, then datetime):
batch_number | datetime | team
001 | 2024-01-01 08:00:00 | A
001 | 2024-01-01 08:01:00 | A
001 | 2024-01-01 08:02:00 | A
001 | 2024-01-01 08:03:00 | B
001 | 2024-01-01 08:04:00 | B
001 | 2024-01-01 08:05:00 | A
001 | 2024-01-01 08:06:00 | A
002 | 2024-01-01 08:00:00 | A
002 | 2024-01-01 08:01:00 | A
002 | 2024-01-01 08:02:00 | B
002 | 2024-01-01 08:03:00 | C
Goal
For each batch, I want to group consecutive rows where the team stays the same and compute:
- start_time = min(datetime)
- end_time = max(datetime)
The result must preserve team sequences, not just distinct teams.
Expected result:
batch_number | team | start_time | end_time
001 | A | 2024-01-01 08:00:00 | 2024-01-01 08:02:00
001 | B | 2024-01-01 08:03:00 | 2024-01-01 08:04:00
001 | A | 2024-01-01 08:05:00 | 2024-01-01 08:06:00
002 | A | 2024-01-01 08:00:00 | 2024-01-01 08:01:00
002 | B | 2024-01-01 08:02:00 | 2024-01-01 08:02:00
002 | C | 2024-01-01 08:03:00 | 2024-01-01 08:03:00
Notes
- Team A appears twice for batch 001 because it appears in two separate consecutive segments.
- The grouping must be done per batch, not across batches.
Important constraints
- A simple summarize by batch_number and team is not correct because it merges non-consecutive segments.
- I can easily implement this logic in pandas using a cumulative sequence identifier, but I have not found a reliable equivalent in Kusto.
Question
What is the correct and reliable Kusto query to compute these consecutive team segments per batch?