Forum Discussion
Availability on OMS
- Feb 13, 2018
Sure. I tweaked it a bit to match what you ask for:
let start_time=startofday(datetime("2017-01-01")); let end_time=endofday(datetime("2017-01-31")); Heartbeat | where TimeGenerated > start_time and TimeGenerated < end_time | summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer | extend available_per_hour=iff(heartbeat_per_hour>0, true, false) | summarize total_available_hours=countif(available_per_hour==true) by Computer | extend total_number_of_buckets=round((end_time-start_time)/1h) | extend availability_rate=total_available_hours*100/total_number_of_bucketsThe first 2 lines define variables, set to the start and end time you mentioned.
Next, we use these variables to limit the query to that time range:
| where TimeGenerated > start_time and TimeGenerated < end_time
Then we count the heartbeats reported from each computer, in buckets (bins) of 1 hour, starting at the start time you define:
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
Now we can see how many heartbeats were reported by each computer each hour. If the number is 0 we understand the computer was probably offline at that time.
We use a new column to mark if a computer was available or not each hour:
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
and then count the number of hours each computer was indeed "alive":
| summarize total_available_hours=countif(available_per_hour==true) by Computer
Note that this way we give a little leeway for missing heartbeat reports each hour. Instead of expecting a report every 5 or 10 minutes, we only mark a computer as "unavailable" if we didn't get any report from it during a full hour.
At this point we get a number for each computer, something like this:
So we know each computer was alive 11 hours in the select time range. But what does it mean? how many hours were there altogether? is this 11 out of 11 hours (100% availability) or out of 110 hours (only 10% availability)?
Here's how we can calculate the total number of hours in the selected time range:
| extend total_number_of_buckets=round((end_time-start_time)/1h)+1
I admit it might not be the best calculation of buckets.. there is probably a better way but I can't think of it now..
finally we calculate the ratio between available hours and total hours:
| extend availability_rate=total_available_hours*100/total_number_of_buckets
and get this:
HTH,
Noa
We are doing the Monthly patches for the Virtual Machines.So when i tried this query it is showing the availability rate as 100% percentage.
But we are rebooting the server after the patching activity, the Availability report percentage should differ but it is showing 100% percent for all the servers.
Can anyone help us on this ? how can we get the exact report ?
Hi! I've update the above query to reflect your week days and hours (Mon-Fri, 07:00-17:59).
Also, the above query considers every hour in which there was even 1 heartbeat as "up time" (available), so this is probably the part you want to tweak. The resolution here depends on your agent. If it reports a heartbeat every 5 minutes, you can do this:
let start_time=startofday(datetime("2019-08-01 07:00:00"));
let end_time=endofday(datetime("2019-08-30 18:00:00"));
Heartbeat
| where TimeGenerated >= start_time and TimeGenerated <= end_time
| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d // Monday-Friday
| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17 // 7:00-17:59
| summarize heartbeat_per_5_minutes=count() by bin_at(TimeGenerated, 5m, start_time), Computer
| extend available_per_5_min=iff(heartbeat_per_5_minutes>0, true, false)
| summarize total_available_buckets=countif(available_per_5_min==true)
, total_unavailable_buckets=countif(available_per_5_min==false) by Computer
| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)
| extend availability_rate=total_available_buckets*100/total_number_of_buckets
Note than anyway, if the reboot was quick and the agent sends a heartbeat every 5 minutes, it might go unnoticed.
HTH,
Noa
- Prashant SharmaSep 04, 2019Brass ContributorHello Noa,
Can we get the output in the form of graph and chart on the Azure dashboard?- CliveWatsonSep 04, 2019Former Employee
Just add a last line of
| render barchart kind=unstackedor if you want less data, pick the columns, using project:
| project Computer, availability_rate | render barchart kind=unstacked title = "Availability Rate per Computer"Go to Log Analytics and Run Query
- GouravINSep 28, 2019Brass Contributor
This question for CliveWatsonNoa Kuperberg
I have used queries and in my workspace and got some discrepancies.
First, I used below one.
let start_time=startofday(datetime("2019-09-01 00:00:00"));
let end_time=endofday(datetime("2019-09-27 00:00:00"));
Heartbeat
| where TimeGenerated >= start_time and TimeGenerated <= end_time
| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d // Monday-Friday
| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17 // 7:00-17:59
| summarize heartbeat_per_1_minutes=count() by bin_at(TimeGenerated, 1m, start_time), Computer
| extend available_per_1_min=iff(heartbeat_per_1_minutes>0, true, false)
| summarize total_available_buckets=countif(available_per_1_min==true)
, total_unavailable_buckets=countif(available_per_1_min==false) by Computer
| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)
| extend availability_rate=total_available_buckets*100/total_number_of_buckets
| where Computer == "vm83560609d9"And got result like this: -
Now concern is why availability_rate for this VM is 100 for September 1st to 27th , when it is no longer available after few days.
And what is total number of bucket and total unavailable bucket?
And on the other hand when i used second query, below one: -
let start_time=startofday(datetime("2019-09-01"));
let end_time=endofday(datetime("2019-09-27"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets
| where Computer == "vm83560609d9"Seems, I am getting right results: -
And in this section, why total number of bucket is too high (648). Same question again what is it?
And total number hours means "the availability of system", if I am not wrong.
Thanks in advance for the help.