Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

- Subscribe to RSS Feed
- Mark Discussion as New
- Mark Discussion as Read
- Pin this Discussion for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Dante Nahuel Ciai

Brass Contributor

Jan 15 2018
11:50 AM
- last edited on
Apr 07 2022
04:51 PM
by
TechCommunityAP

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 15 2018
11:50 AM
- last edited on
Apr 07 2022
04:51 PM
by
TechCommunityAP

Hi everyone.

I'm trying to find a way of getting Availability of servers on OMS, but I can't find any...

By Availability I mean the % of uptime of a given server during a certain period of time.

So, if a server was up 98 of a total 100 hours, the availability for that period is 98%.

I'm looking to do that in OMS, but I'm not sure it's possible.

Thanks in advance.

Labels:

36 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 15 2018 11:07 PM

Hi
There is no out of the box solution for this. Probably one way to calculate this is by using Heartbeat events. For Windows machines they are logged every 1 minute and for Linux every 5 minutes. You could probably calculate what should be the sum of the events for each machine and also the maximum sum for specific time frame. From those two values you can calculate the percentage. The query language is pretty rich so this should be possible.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 16 2018 04:44 AM

Stanislav is right, it's possible :)

Here's an example that calculates the availability rate of each computer, starting at midnight.

let midnight=startofday(now()); Heartbeat | where TimeGenerated>midnight | summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, midnight), Computer | extend available_per_hour=iff(heartbeat_per_hour>0, true, false) | summarize total_available_hours=countif(available_per_hour==true) by Computer | extend number_of_buckets=hourofday(now())+1 | extend availability_rate=total_available_hours*100/number_of_buckets

Run it on our playground and tweak it as makes sense to you.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 16 2018 06:54 AM

Thank you for your help, im going to investigate a bit that query.

However, i'm not sure about that approach because the heartbeat happens to stop working a lot even if the VM is perfectly fine.

But I understand the approach...

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 16 2018 06:58 AM

Heartbeat should be running without issue however there are scenarios when you might not get data:
- Log Analytics service is down
- You machine has lost Internet connection
- MMA agent service is stopped
- MMA agent is not functioning properly

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Jan 16 2018 08:35 AM

The last situation MMA agent not working properly or has stopped working is exactly what worries me in order to create the availability report based on heartbeats.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Feb 12 2018 10:42 AM

Noa, your script is amazing, however i'm struggling to understand it and tweak it to my needs (30 fixed days, for example from 1st to 31 of january)

Could you gimme a hand to understand it?

let midnight=startofday(now()) ;#Heartbeat | where TimeGenerated>midnight | summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, midnight)First part. I need to change this to between((2018-01-01) .. (2017-01-31)); am I correct?#im not sure i understand why do you use bin_at instead of just bin, Computer | extend available_per_hour=iff(heartbeat_per_hour>0, true, false) | summarize total_available_hours=countif(available_per_hour==true) by Computer | extend number_of_buckets=hourofday(now())+1 | extend availability_rate=total_available_hours*100/number_of_buckets

best response confirmed by
Dante Nahuel Ciai* (Brass Contributor)*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Feb 13 2018 03:22 AM - edited Feb 13 2018 03:28 AM

SolutionSure. I tweaked it a bit to match what you ask for:

let start_time=startofday(datetime("2017-01-01")); let end_time=endofday(datetime("2017-01-31")); Heartbeat | where TimeGenerated > start_time and TimeGenerated < end_time | summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer | extend available_per_hour=iff(heartbeat_per_hour>0, true, false) | summarize total_available_hours=countif(available_per_hour==true) by Computer | extend total_number_of_buckets=round((end_time-start_time)/1h) | extend availability_rate=total_available_hours*100/total_number_of_buckets

The first 2 lines define variables, set to the start and end time you mentioned.

Next, we use these variables to limit the query to that time range:

| where TimeGenerated > start_time and TimeGenerated < end_time

Then we count the heartbeats reported from each computer, in buckets (bins) of 1 hour, starting at the start time you define:

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

Now we can see how many heartbeats were reported by each computer each hour. If the number is 0 we understand the computer was probably offline at that time.

We use a new column to mark if a computer was available or not each hour:

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

and then count the number of hours each computer was indeed "alive":

| summarize total_available_hours=countif(available_per_hour==true) by Computer

Note that this way we give a little leeway for missing heartbeat reports each hour. Instead of expecting a report every 5 or 10 minutes, we only mark a computer as "unavailable" if we didn't get any report from it during a full hour.

At this point we get a number for each computer, something like this:

So we know each computer was alive 11 hours in the select time range. But what does it mean? how many hours were there altogether? is this 11 out of 11 hours (100% availability) or out of 110 hours (only 10% availability)?

Here's how we can calculate the total number of hours in the selected time range:

| extend total_number_of_buckets=round((end_time-start_time)/1h)+1

I admit it might not be the best calculation of buckets.. there is probably a better way but I can't think of it now..

finally we calculate the ratio between available hours and total hours:

| extend availability_rate=total_available_hours*100/total_number_of_buckets

and get this:

HTH,

Noa

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Feb 23 2018 12:28 PM

Excellent query. How can "let midnight=startofday(now())" be altered to make it my local time zone? If I run this as is, it seems to be my time +7, and the amount of hours don't match up.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Feb 25 2018 01:23 AM

Hi,

Thanks for a exellent code sample.

I would like to extend the Query, supporting also specified time intervals and smaller uptime checks (heartbeat)

# Service levels

Ex: Service agreements are based on 3 categories

S1 = 07:00 - 17:00 Weekdays

S2 = 07:00- 22:00 Weekdays

365/7 = Always (already supported by your query

= Uptime should be calculated based on service agreement hours/days

Time should also be converted to UTC +1

- will this do the trick = >

Heartbeat

| extend Timegenerated = TimeGenerated + 1h

I checked the samples from endofday/week, but are unable to get it to work in your sample

# Intervals

extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

= can this be adjusted to heartbeat per 30 min / 15 min

Any ideas ?

Br

erik

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Apr 08 2018 06:06 AM

Thanks George.

To adjust for the local time zone you can do this:

let midnight=startofday(now())-7h

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Apr 08 2018 06:31 AM

Hi Eric,

To adjust for the service agreement, you can calculate the start time and end time like this:

let raw_date = datetime("2017-01-01"); let start_date = case("SLA" in ("S1", "S2"), case(dayofweek(raw_date)==0, startofday(raw_date+1d)+7h, dayofweek(raw_date)==6, startofday(raw_date+2d)+7h, startofday(raw_date)+7h), raw_date);

On the intervals - it can adjusted any way you need, just use `bin(fieldname, 30m)` instead of `bin(fieldname, 1h)`.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Apr 26 2018 08:09 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Aug 07 2018 08:48 AM

I am struggling to generate the report for Mon-Friday only and in my time zone. I just get errors.

let start_time=startofday(datetime("2018-07-1 07:30:00"));

let end_time=endofday(datetime("2018-07-31 18:00:00"));

Heartbeat

| where TimeGenerated > start_time and TimeGenerated < end_time

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

| summarize total_available_hours=countif(available_per_hour==true) by Computer

| extend total_number_of_buckets=round((end_time-start_time)/1h)

| extend availability_rate=total_available_hours*100/total_number_of_buckets

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Aug 07 2018 08:50 AM

I am struggling to generate the report for Mon-Friday only and in my time zone. I just get errors. The script below works for me. Thanks

let start_time=startofday(datetime("2018-07-1 07:30:00"));

let end_time=endofday(datetime("2018-07-31 18:00:00"));

Heartbeat

| where TimeGenerated > start_time and TimeGenerated < end_time

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

| summarize total_available_hours=countif(available_per_hour==true) by Computer

| extend total_number_of_buckets=round((end_time-start_time)/1h)

| extend availability_rate=total_available_hours*100/total_number_of_buckets

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Aug 07 2018 11:52 AM

Hey Noa,

Can we take the 1 year details by this script.?

Can we take the 1 year details by this script.?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Sep 11 2018 06:29 PM

Can we Availabilty for past 10 days instead of add start date and End date

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Sep 11 2018 07:11 PM

Thanks but I got the answer

let month = startofday(ago(3d));

Heartbeat

| where TimeGenerated>ago(3d)

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, (ago(3d))), Computer

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

| summarize total_available_hours=countif(available_per_hour==true) by Computer

| extend total_number_of_buckets= round((now()-month)/1h)-2

| extend availability_rate=total_available_hours*100/total_number_of_buckets

1 best response

Accepted Solutions

best response confirmed by
Dante Nahuel Ciai* (Brass Contributor)*

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Feb 13 2018 03:22 AM - edited Feb 13 2018 03:28 AM

SolutionSure. I tweaked it a bit to match what you ask for:

let start_time=startofday(datetime("2017-01-01")); let end_time=endofday(datetime("2017-01-31")); Heartbeat | where TimeGenerated > start_time and TimeGenerated < end_time | summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer | extend available_per_hour=iff(heartbeat_per_hour>0, true, false) | summarize total_available_hours=countif(available_per_hour==true) by Computer | extend total_number_of_buckets=round((end_time-start_time)/1h) | extend availability_rate=total_available_hours*100/total_number_of_buckets

The first 2 lines define variables, set to the start and end time you mentioned.

Next, we use these variables to limit the query to that time range:

| where TimeGenerated > start_time and TimeGenerated < end_time

Then we count the heartbeats reported from each computer, in buckets (bins) of 1 hour, starting at the start time you define:

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

Now we can see how many heartbeats were reported by each computer each hour. If the number is 0 we understand the computer was probably offline at that time.

We use a new column to mark if a computer was available or not each hour:

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

and then count the number of hours each computer was indeed "alive":

| summarize total_available_hours=countif(available_per_hour==true) by Computer

Note that this way we give a little leeway for missing heartbeat reports each hour. Instead of expecting a report every 5 or 10 minutes, we only mark a computer as "unavailable" if we didn't get any report from it during a full hour.

At this point we get a number for each computer, something like this:

So we know each computer was alive 11 hours in the select time range. But what does it mean? how many hours were there altogether? is this 11 out of 11 hours (100% availability) or out of 110 hours (only 10% availability)?

Here's how we can calculate the total number of hours in the selected time range:

| extend total_number_of_buckets=round((end_time-start_time)/1h)+1

I admit it might not be the best calculation of buckets.. there is probably a better way but I can't think of it now..

finally we calculate the ratio between available hours and total hours:

| extend availability_rate=total_available_hours*100/total_number_of_buckets

and get this:

HTH,

Noa