Customers interacting with Time Series, IoT Analytics, and Infra/App Logs often have a challenge identifying the total downtime of a device or an application. In this blog we will look at some of the common scenarios and see how we can use the power of Kusto Query Language to solve this challenge. Please note that these sample queries do not cover all the edge cases, please use them as a reference and adjust the logic for your scenario.
Scenario 1: Compute downtime based on health status
In this scenario, the devices are reporting health status as ‘Up’ or ‘Down’ or a binary value 1 or 0 or anything else. Our task is to identify how long the device was in a specific state. In this case we are interested in how long the device was down. Check the below sample query to identify the change in status and compute the start and end time of a given state.
Execute: [Web] [Desktop] https://help.kusto.windows.net/Samples
let DeviceHealth = datatable (TimeStamp:datetime, DeviceId:string, State:int) [ '2020-04-24 10:40:00', '12987679', 0, '2020-04-24 10:40:00', '21998045', 0, '2020-04-24 11:00:00', '12987679', 1, '2020-04-24 11:05:00', '12987679', 1, '2020-04-24 11:49:00', '12987679', 0, '2020-04-24 11:30:00', '21998045', 1, '2020-04-24 12:30:00', '21998045', 0, '2020-04-24 14:30:00', '21998045', 1, '2020-04-24 12:45:00', '12987679', 0, '2020-04-24 12:48:00', '12987679', 0, '2020-04-24 13:00:00', '12987679', 1, '2020-04-24 16:14:00', '21998045', 0, '2020-04-24 16:30:00', '21998045', 1, '2020-04-24 16:31:00', '21998045', 0 ]; DeviceHealth | where TimeStamp between (datetime('2020-04-23') .. datetime(2020-04-25)) | summarize TimeStampList = make_list(TimeStamp), StateList=make_list(State) by DeviceId | mv-apply TimeStampList to typeof(datetime), StateList to typeof(string) on ( order by TimeStampList asc | extend PrvTimeStamp=prev(TimeStampList) | extend PrvState=prev(StateList) | extend OutageStartTime= iff((PrvState == 1 or isempty(PrvState)) and StateList==0, TimeStampList, todatetime('')) | extend OutageEndTime= iff(PrvState == 0 and StateList==1, TimeStampList, todatetime('')) | where isempty(OutageStartTime) <> isempty(OutageEndTime) | extend OutageStartTime = iif(isempty(OutageStartTime), prev(OutageStartTime),OutageStartTime) | extend OutageEndTime = iif(isempty(OutageEndTime ), next(OutageEndTime),OutageEndTime ) | summarize by DeviceId, OutageStartTime, OutageEndTime, TotalDurationInMin = datetime_diff('minute',OutageEndTime, OutageStartTime) )
Sample output:
Explanation of the above sample query:
Scenario 2: Compute downtime based on missing signals
In some cases, the devices don’t report specific health status, instead just stop sending signals. The below sample query identifies missing signals and compute the start and end time when the signal was missing.
Execute: [Web] [Desktop] https://help.kusto.windows.net/Samples
let DeviceTelemetry = datatable (TimeStamp:datetime, DeviceId:string, BatteryLevel:int, Temp:real, Humidity:real) [ '2020-04-23T23:11:51.903Z', '637086755205674255', 24, 64.8, 68, '2020-04-23T23:12:00.143Z', '637085868243706792', 34, 70.3, 61.8, '2020-04-23T23:13:07.308Z', '637086755205674255', 83, 61.3, 68.8, '2020-04-23T23:13:15.584Z', '637085868243706792', 69, 53.8, 64.6, '2020-04-23T23:14:22.714Z', '637086755205674255', 19, 67.9, 77.6, '2020-04-23T23:14:30.989Z', '637085868243706792', 6, 72.9, 67.7, '2020-04-23T23:15:37.516Z', '637086755205674255', 45, 53.5, 56.8, '2020-04-23T23:15:45.766Z', '637085868243706792', 66, 49.1, 72.3, '2020-04-23T23:16:52.951Z', '637086755205674255', 48, 70, 66.8, '2020-04-23T23:17:01.226Z', '637085868243706792', 50, 71.1, 70.3, '2020-04-23T23:59:52.951Z', '637086755205674255', 48, 70, 66.8, '2020-04-23T23:59:01.226Z', '637085868243706792', 50, 71.1, 70.3 ]; let StartDateTime = datetime('2020-04-23 20:00'); let EndDateTime = datetime('2020-04-24 05:00'); DeviceTelemetry | make-series count() default=0 on TimeStamp from StartDateTime to EndDateTime step 1m by DeviceId | mv-apply TimeStamp to typeof(datetime), count_ to typeof(int) on ( order by TimeStamp asc | extend PrvTimeStamp=iif(isempty(prev(TimeStamp)), StartDateTime,prev(TimeStamp)) | extend PrvCount=iif(isempty(prev(count_)), -1,prev(count_)) | extend MissingDataStartTime= iff(PrvCount > 0 and count_==0, TimeStamp, iif(PrvCount==-1, TimeStamp,todatetime(''))) | extend MissingDataEndTime= iff(PrvCount == 0 and count_>0, TimeStamp, iif(isempty(next(TimeStamp)), TimeStamp,todatetime(''))) | where isnotempty(MissingDataStartTime) or isnotempty(MissingDataEndTime) | extend MissingDataEndTime = iif(isempty(MissingDataEndTime), next(MissingDataEndTime),todatetime('')) | where PrvCount != 0 | project DeviceId, MissingDataStartTime, MissingDataEndTime, TotalDurationInMin = datetime_diff('minute',MissingDataEndTime, MissingDataStartTime) )
Sample output:
Explanation of the above sample query:
Scenario 3: Compute downtime based on threshold
Building on the scenario #2, we want to detect all the time slots where the device did not report any status in a 30 min window.
Execute: [Web] [Desktop] https://help.kusto.windows.net/Samples
let DeviceHealth = datatable (TimeStamp:datetime, DeviceId:string) [ '2020-04-24 10:40:00', '12987679', '2020-04-24 10:40:00', '21998045', '2020-04-24 11:00:00', '12987679', '2020-04-24 11:05:00', '12987679', '2020-04-24 11:49:00', '12987679', '2020-04-24 11:30:00', '21998045', '2020-04-24 12:30:00', '21998045', '2020-04-24 14:30:00', '21998045', '2020-04-24 12:45:00', '12987679', '2020-04-24 12:48:00', '12987679', '2020-04-24 13:00:00', '12987679', '2020-04-24 12:44:00', '21998045', '2020-04-24 16:14:00', '21998045', '2020-04-24 16:30:00', '21998045', ]; let outage_threshold=30m; DeviceHealth | order by DeviceId asc, TimeStamp asc | extend dt=iff(DeviceId != prev(DeviceId), 0s, TimeStamp-prev(TimeStamp)) | extend State=iff(DeviceId != prev(DeviceId), 1, iff(dt >= outage_threshold, 0, 1)) | extend PrvTimeStamp=prev(TimeStamp) | extend PrvState=prev(State) | extend OutageStartTime= iff(State == 0, PrvTimeStamp, todatetime('')) | extend OutageEndTime= iff(State==0, TimeStamp, todatetime('')) | where isnotempty(OutageEndTime) and isnotempty(OutageEndTime) | project DeviceId, OutageStartTime, OutageEndTime, TotalDurationInMin = datetime_diff('minute',OutageEndTime, OutageStartTime)
Sample output:
Explanation of the above sample query:
Scenario 4: Calculate uptime for each device for a given period
Finally, let’s build on Scenario #1 and calculate the total availability of the device.
Execute: [Web] [Desktop] https://help.kusto.windows.net/Samples
let StartTime = datetime('2020-04-23'); let EndTime = datetime('2020-04-25'); let DeviceHealth = datatable (TimeStamp:datetime, DeviceId:string, State:int) [ '2020-04-24 10:40:00', '12987679', 0, '2020-04-24 10:40:00', '21998045', 0, '2020-04-24 11:00:00', '12987679', 1, '2020-04-24 11:05:00', '12987679', 1, '2020-04-24 11:49:00', '12987679', 0, '2020-04-24 11:30:00', '21998045', 1, '2020-04-24 12:30:00', '21998045', 0, '2020-04-24 14:30:00', '21998045', 1, '2020-04-24 12:45:00', '12987679', 0, '2020-04-24 12:48:00', '12987679', 0, '2020-04-24 13:00:00', '12987679', 1, '2020-04-24 16:14:00', '21998045', 0, '2020-04-24 16:30:00', '21998045', 1, ]; DeviceHealth | where TimeStamp between (StartTime .. EndTime) | summarize TimeStampList = make_list(TimeStamp), StateList=make_list(State) by DeviceId | mv-apply TimeStampList to typeof(datetime), StateList to typeof(string) on ( order by TimeStampList asc | extend PrvTimeStamp=prev(TimeStampList) | extend PrvState=prev(StateList) | extend OutageStartTime= iff((PrvState == 1 or isempty(PrvState)) and StateList==0, TimeStampList, todatetime('')) | extend OutageEndTime= iff(PrvState == 0 and StateList==1, TimeStampList, todatetime('')) | where isempty(OutageStartTime) <> isempty(OutageEndTime) | extend OutageStartTime = iif(isempty(OutageStartTime), prev(OutageStartTime),OutageStartTime) | where isnotempty(OutageEndTime) | project DeviceId, OutageStartTime, OutageEndTime, TotalDurationInMin = datetime_diff('minute',OutageEndTime, OutageStartTime) ) | extend TotalMin = toreal(datetime_diff('minute', EndTime, StartTime)) | summarize Availability=((max(TotalMin)-sum(TotalDurationInMin))/max(TotalMin))*100 by DeviceId
Sample output:
Explanation of the above sample query:
Next steps
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.