Forum Discussion
Kusto: summarize consecutive segments of the same value per batch
Problem statement (Azure Data Explorer / Kusto)
I have a table in Azure Data Explorer (Kusto) with the following columns:
- batch_number (string)
- team (string)
- datetime (datetime)
Each row represents work done by a team on a batch at a given time.
The data is chronological per batch, but teams can change over time and may come back later.
Example data (ordered by batch_number, then datetime):
batch_number | datetime | team
001 | 2024-01-01 08:00:00 | A
001 | 2024-01-01 08:01:00 | A
001 | 2024-01-01 08:02:00 | A
001 | 2024-01-01 08:03:00 | B
001 | 2024-01-01 08:04:00 | B
001 | 2024-01-01 08:05:00 | A
001 | 2024-01-01 08:06:00 | A
002 | 2024-01-01 08:00:00 | A
002 | 2024-01-01 08:01:00 | A
002 | 2024-01-01 08:02:00 | B
002 | 2024-01-01 08:03:00 | C
Goal
For each batch, I want to group consecutive rows where the team stays the same and compute:
- start_time = min(datetime)
- end_time = max(datetime)
The result must preserve team sequences, not just distinct teams.
Expected result:
batch_number | team | start_time | end_time
001 | A | 2024-01-01 08:00:00 | 2024-01-01 08:02:00
001 | B | 2024-01-01 08:03:00 | 2024-01-01 08:04:00
001 | A | 2024-01-01 08:05:00 | 2024-01-01 08:06:00
002 | A | 2024-01-01 08:00:00 | 2024-01-01 08:01:00
002 | B | 2024-01-01 08:02:00 | 2024-01-01 08:02:00
002 | C | 2024-01-01 08:03:00 | 2024-01-01 08:03:00
Notes
- Team A appears twice for batch 001 because it appears in two separate consecutive segments.
- The grouping must be done per batch, not across batches.
Important constraints
- A simple summarize by batch_number and team is not correct because it merges non-consecutive segments.
- I can easily implement this logic in pandas using a cumulative sequence identifier, but I have not found a reliable equivalent in Kusto.
Question
What is the correct and reliable Kusto query to compute these consecutive team segments per batch?
1 Reply
- lilleybCopper Contributor
Here’s a reliable and efficient Kusto query that solves the "gaps and islands" problem to group consecutive rows with the same team per batch, without merging non-consecutive segments:
kusto
YourTable | order by batch_number asc, datetime asc | serialize | extend prev_batch = prev(batch_number), prev_team = prev(team) | extend is_new_segment = isnull(prev_team) or (batch_number != prev_batch) or (team != prev_team) | extend segment_id = row_cumsum(iff(is_new_segment, 1, 0)) | summarize start_time = min(datetime), end_time = max(datetime), team = take_any(team) by batch_number, segment_id | project batch_number, team, start_time, end_time | order by batch_number asc, start_time ascExplanation:
- order by + serialize ensures the data is properly ordered and allows safe use of prev().
- We detect the start of a new segment whenever the batch changes or the team is different from the previous row.
- row_cumsum creates a unique segment ID for each consecutive group.
- summarize aggregates the min/max datetime per segment, keeping batch_number in the by clause to group correctly per batch.
- Final project and order by give the clean expected output.
This produces exactly the desired result for your sample data, preserving separate segments for the same team when they are not consecutive (e.g., team A appears twice in batch 001).
.............................................................................................
https://jjewellery.vn/blogs/news/jjewellery-cua-hang-nhan-cuoi-uy-tin-tai-quan-1-tp-hcm