About the author, Chango Valtchev, in his own words: I’ve been proudly serving Microsoft since the times Bill Gates was still seen regularly around the Redmond campus. I’ve been working on Compute management services in Azure Core for ten years, focusing on reliability and scalability. I indulge in crafting telemetry data and doing deep datamining on it, doing service-related analytics, and finding uncommon uses of Kusto, like generating operational scripts (even generating Kusto stored functions :smiling_face_with_smiling_eyes:), and doing interesting transforms of a service’s stored data based on a transaction log asynchronously streamed to Kusto.
I’ve been using Kusto for many years in our services environment, and it doesn’t cease to amaze with its rich and powerful capabilities. It feels like only your imagination limits what you can do with Kusto! Some super-nifty features take a while to discover, and others take a while to figure out or to fully appreciate. Such has been the story of the timeline visualization for me. It’s not hidden at all: there’s a shortcut right on the main ribbon in the desktop Kusto Explorer:
But if you just try it casually on a random query result, even if it has the required Timestamp field, you’ll likely not get a very interesting/useful picture. And thus, regrettably, I ignored the chart option for years. Turns out the required data model is very simple and very easy to produce from typical service logs, once you have the right idea.
This post will illustrate a great use case that came up in the context of my current service, and I’ll share a few tips for using Time Pivot. To save us covering the basics here, I recommend you read Olga Goldenberg’s Spotlight on the ADX Time Pivot Visualization, which describes the feature in detail.
There are probably various other creative uses, but so far the best one I’ve found is for visualization of parallel event timelines and browsing the details associated with these events. The source for these events can simply be timestamped log statements, with one or more dimensions to define the timeline and with optional sub-grouping.
This is the scenario: We have a job scheduler and a related job deployment manager, both implemented based on a state machines framework. One of the scheduler features is preemptable jobs: Jobs of that class can be suspended when a high-priority job needs to be scheduled and there is no available capacity. Effecting preemption requires some involved orchestration between the scheduler and the deployment manager, and we’ve had reliability issues in some cases - both due to incorrectly handled races and latency spikes in the cleanup of the suspended jobs from the cluster. Debugging such issues based on the raw logs has been very tedious - a typical log is 10-30K lines. This gets much worse with the number of dependencies. Given the concurrent processing of the suspensions, tracking the interactions with the new job’s deployment can be mentally taxing. The timeline visualization brought a breakthrough to our debugging ability and productivity. The following sample is a purposefully simplified case. In this scenario, things worked well. It shows the ‘Main’ job, at high priority, waiting on its dependencies to be suspended (while waiting, “Skipped schedule processing” is logged). Shortly after all the suspensions complete, the main job gets to Running state.
Here is what the underlying data looks like.
Roughly speaking, this is a union of selected interesting point-in-time events and formal state transitions per job, scraped from the raw logs. (We wish we had somewhat more structured, semantic logs, but, ironically, the power of Kusto detracts from the motivation to make such an investment.)
The ‘message’ field is not used for the rendering, but it’s available to get the details of the timeline events when browsing the log below the visualization.
When things don’t go well with the suspension of job dependencies, visualization becomes invaluable because it lets us easily spot deviations from the normal pattern and then focus the debugging effort on a specific interaction or dependency job. Here’s an example where the preempted job went back to the scheduler queue and managed to sneak back into the same cluster (due to a slight inaccuracy of the capacity bookkeeping around such a preemption scenario):
In such a case the interactive feature of the visualization comes in very handy. You can select rows or cells from the grid above, and only the corresponding subset of log entries are displayed. There you can find useful details from the non-dimension columns (‘message’ in our case). Very thoughtfully designed.
Such visualizations are likely to be reusable, and it’s very convenient to pack them in stored functions. (The `render` operator works in a stored function. I didn’t realize this until my re-discovery of the timeline chart. :smiling_face_with_smiling_eyes:) Thus, anybody on the team can get the above visualization simply like this:
TrackSchedulingWithDependencyPreemption(mainJobId="524f1369-2c57-428a-acc4-37e9692ce758", startTime=datetime(2022-11-23 22:38), period=4m)
The Kusto timeline visualization very much resembles Gantt charts. But it’s actually much easier to produce from the typical service semi-structured logs because you don’t have to explicitly frame the start and end of an activity (sub-timeline). Instead, Kusto just picks up the discrete events, time-buckets them based on their timestamps, and renders the timelines as defined by the dimension columns. Further, it has some interesting smarts:
Try it, spread the word! Popular features tend to get the most continued investment.
* The timeline visualization is not in the web UI yet. Please use http://aka.ms/ADX.ideas to vote for this ask (and others)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.