• 470K Members
• 6,225 Online
• 568K Conversations
SOLVED

Highlighted
Contributor

Availability on OMS

Hi everyone.

I'm trying to find a way of getting Availability of servers on OMS, but I can't find any...

By Availability I mean the % of uptime of a given server during a certain period of time.

So, if a server was up 98 of a total 100 hours, the availability for that period is 98%.

I'm looking to do that in OMS, but I'm not sure it's possible.

33 Replies

Re: Availability on OMS

Hi There is no out of the box solution for this. Probably one way to calculate this is by using Heartbeat events. For Windows machines they are logged every 1 minute and for Linux every 5 minutes. You could probably calculate what should be the sum of the events for each machine and also the maximum sum for specific time frame. From those two values you can calculate the percentage. The query language is pretty rich so this should be possible.

Re: Availability on OMS

Stanislav is right, it's possible :)

Here's an example that calculates the availability rate of each computer, starting at midnight.

```let midnight=startofday(now());
Heartbeat
| where TimeGenerated>midnight
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, midnight), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend number_of_buckets=hourofday(now())+1
| extend availability_rate=total_available_hours*100/number_of_buckets```

Run it on our playground and tweak it as makes sense to you.

Re: Availability on OMS

Thank you for your help, im going to investigate a bit that query.

However, i'm not sure about that approach because the heartbeat happens to stop working a lot even if the VM is perfectly fine.

But I understand the approach...

Re: Availability on OMS

Heartbeat should be running without issue however there are scenarios when you might not get data: - Log Analytics service is down - You machine has lost Internet connection - MMA agent service is stopped - MMA agent is not functioning properly

Re: Availability on OMS

The last situation MMA agent not working properly or has stopped working is exactly what worries me in order to create the availability report based on heartbeats.

Re: Availability on OMS

Noa, your script is amazing, however i'm struggling to understand it and tweak it to my needs (30 fixed days, for example from 1st to 31 of january)

Could you gimme a hand to understand it?

```let midnight=startofday(now()) ; #First part. I need to change this to between((2018-01-01) .. (2017-01-31)); am I correct?
Heartbeat
| where TimeGenerated>midnight
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, midnight) #im not sure i understand why do you use bin_at instead of just bin, Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend number_of_buckets=hourofday(now())+1
| extend availability_rate=total_available_hours*100/number_of_buckets```

Solution

Re: Availability on OMS

Sure. I tweaked it a bit to match what you ask for:

```let start_time=startofday(datetime("2017-01-01"));
let end_time=endofday(datetime("2017-01-31"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets```

The first 2 lines define variables, set to the start and end time you mentioned.

Next, we use these variables to limit the query to that time range:

`| where TimeGenerated > start_time and TimeGenerated < end_time`

Then we count the heartbeats reported from each computer, in buckets (bins) of 1 hour, starting at the start time you define:

`| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer`

Now we can see how many heartbeats were reported by each computer each hour. If the number is  0 we understand the computer was probably offline at that time.

We use a new column to mark if a computer was available or not each hour:

`| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)`

and then count the number of hours each computer was indeed "alive":

`| summarize total_available_hours=countif(available_per_hour==true) by Computer`

Note that this way we give a little leeway for missing heartbeat reports each hour. Instead of expecting a report every 5 or 10 minutes, we only mark a computer as "unavailable" if we didn't get any report from it during a full hour.

At this point we get a number for each computer, something like this:

So we know each computer was alive 11 hours in the select time range. But what does it mean? how many hours were there altogether? is this 11 out of 11 hours (100% availability) or out of 110 hours (only 10% availability)?

Here's how we can calculate the total number of hours in the selected time range:

`| extend total_number_of_buckets=round((end_time-start_time)/1h)+1`

I admit it might not be the best calculation of buckets.. there is probably a better way but I can't think of it now..

finally we calculate the ratio between available hours and total hours:

`| extend availability_rate=total_available_hours*100/total_number_of_buckets`

and get this:

HTH,

Noa

Re: Availability on OMS

Noa, amazing, thank you so much.

Re: Availability on OMS

Excellent query. How can "let midnight=startofday(now())" be altered to make it my local time zone? If I run this as is, it seems to be my time +7, and the amount of hours don't match up.

Re: Availability on OMS

Hi,

Thanks for a exellent code sample.

I would like to extend the Query, supporting also specified time intervals and smaller uptime checks (heartbeat)

# Service levels

Ex: Service agreements are based on 3 categories

S1 = 07:00 - 17:00 Weekdays

S2 = 07:00- 22:00 Weekdays

365/7 = Always (already supported by your query

= Uptime should be calculated based on service agreement hours/days

Time should also be converted to UTC +1

- will this do the trick = >

Heartbeat
| extend Timegenerated = TimeGenerated + 1h

I checked the samples from endofday/week, but are unable to get it to work in your sample

# Intervals
extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
= can this be adjusted to heartbeat per 30 min / 15 min

Any ideas ?

Br
erik

Re: Availability on OMS

Thanks George.

To adjust for the local time zone you can do this:

`let midnight=startofday(now())-7h`

Re: Availability on OMS

Hi Eric,

To adjust for the service agreement, you can calculate the start time and end time like this:

```let raw_date = datetime("2017-01-01");
let start_date = case("SLA" in ("S1", "S2"), case(dayofweek(raw_date)==0, startofday(raw_date+1d)+7h,
dayofweek(raw_date)==6, startofday(raw_date+2d)+7h,
startofday(raw_date)+7h),
raw_date);```

On the intervals - it can adjusted any way you need, just use `bin(fieldname, 30m)` instead of `bin(fieldname, 1h)`.

Re: Availability on OMS

it is a pwerShell Script?

Re: Availability on OMS

Hey,

That's not powershell but our query language, that can be used here and through our API.

Re: Availability on OMS

I am struggling to generate the report for Mon-Friday only and in my time zone. I just get errors.

let start_time=startofday(datetime("2018-07-1 07:30:00"));
let end_time=endofday(datetime("2018-07-31 18:00:00"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets

Re: Availability on OMS

I am struggling to generate the report for Mon-Friday only and in my time zone. I just get errors. The script below works for me. Thanks

let start_time=startofday(datetime("2018-07-1 07:30:00"));
let end_time=endofday(datetime("2018-07-31 18:00:00"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets

Re: Availability on OMS

Hey Noa,

Can we take the 1 year details by this script.?

Re: Availability on OMS

Can  we Availabilty for past 10 days instead of add start date and End date

Re: Availability on OMS

Thanks but I got the answer

let month = startofday(ago(3d));
Heartbeat
| where TimeGenerated>ago(3d)
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, (ago(3d))), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets= round((now()-month)/1h)-2
| extend availability_rate=total_available_hours*100/total_number_of_buckets

Re: Availability on OMS

Hi Prashant,

We can get data by doing amendment in dates. But make sure you have data retention policy for last year to save data.

You can check here:-

Hi Gaurav,

I got it,

Thank You

Re: Availability on OMS

Love this query. I'm having trouble modifying it to meet my needs.

In addition to what this query provides, I'd also like to show the last TimeGenerated for each Computer. I can't seem to get the logic to work correctly. Any help is appreciated.

Re: Availability on OMS

let start_time=startofday(ago(30d));
let end_time=startofday(now());
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets
| order by availability_rate desc

Re: Availability on OMS

Awesome Script Thanks

Re: Availability on OMS

Does anyone have a way to restrict the script to pull heartbeats Mon-Fri and 7am-6pm? I keep getting errors. This is what I have:

let start_time=startofday(datetime("2018-06-1 07:30:00"));
let end_time=endofday(datetime("2018-06-30 18:00:00"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets

Re: Availability on OMS

I have used the query, there are some differences which is not reflecting for us in the log analytics.

We are doing the Monthly patches for the Virtual Machines.So when i tried this query it is showing the availability rate as 100% percentage.
But we are rebooting the server after the patching activity, the Availability report percentage should differ but it is showing 100% percent for all the servers.

Can anyone help us on this ? how can we get the exact report ?

Re: Availability on OMS

Hi! I've update the above query to reflect your week days and hours (Mon-Fri, 07:00-17:59).

Also, the above query considers every hour in which there was even 1 heartbeat as "up time" (available), so this is probably the part you want to tweak. The resolution here depends on your agent. If it reports a heartbeat every 5 minutes, you can do this:

``````let start_time=startofday(datetime("2019-08-01 07:00:00"));
let end_time=endofday(datetime("2019-08-30 18:00:00"));
Heartbeat
| where TimeGenerated >= start_time and TimeGenerated <= end_time
| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d  // Monday-Friday
| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17   // 7:00-17:59
| summarize heartbeat_per_5_minutes=count() by bin_at(TimeGenerated, 5m, start_time), Computer
| extend available_per_5_min=iff(heartbeat_per_5_minutes>0, true, false)
| summarize total_available_buckets=countif(available_per_5_min==true)
, total_unavailable_buckets=countif(available_per_5_min==false) by Computer
| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)
| extend availability_rate=total_available_buckets*100/total_number_of_buckets``````

Note than anyway, if the reboot was quick and the agent sends a heartbeat every 5 minutes, it might go unnoticed.

HTH,

Noa

Re: Availability on OMS

Hello Noa,
Can we get the output in the form of graph and chart on the Azure dashboard?

Re: Availability on OMS

Just add a last line of

``| render barchart kind=unstacked ``

or if you want less data, pick the columns, using project:

``````| project Computer, availability_rate
| render barchart kind=unstacked title = "Availability Rate per Computer"``````

Re: Availability on OMS

`This question for @Clive Watson@Noa Kuperberg `

I have used queries and in my workspace and got some discrepancies.

First, I used below one.

`let start_time=startofday(datetime("2019-09-01 00:00:00"));let end_time=endofday(datetime("2019-09-27 00:00:00"));Heartbeat| where TimeGenerated >= start_time and TimeGenerated <= end_time| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d // Monday-Friday| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17 // 7:00-17:59| summarize heartbeat_per_1_minutes=count() by bin_at(TimeGenerated, 1m, start_time), Computer| extend available_per_1_min=iff(heartbeat_per_1_minutes>0, true, false)| summarize total_available_buckets=countif(available_per_1_min==true), total_unavailable_buckets=countif(available_per_1_min==false) by Computer| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)| extend availability_rate=total_available_buckets*100/total_number_of_buckets| where Computer == "vm83560609d9"`

And got result like this: -

Now concern is why availability_rate for this VM is 100 for September 1st to 27th , when it is no longer available after few days.

And what is total number of bucket and total unavailable bucket?

And on the other hand when i used second query, below one: -

`let start_time=startofday(datetime("2019-09-01"));let end_time=endofday(datetime("2019-09-27"));Heartbeat| where TimeGenerated > start_time and TimeGenerated < end_time| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)| summarize total_available_hours=countif(available_per_hour==true) by Computer | extend total_number_of_buckets=round((end_time-start_time)/1h)| extend availability_rate=total_available_hours*100/total_number_of_buckets| where Computer == "vm83560609d9"`

Seems, I am getting right results: -

And in this section, why total number of bucket is too high (648). Same question again what is it?

And total number hours means "the availability of system", if I am not wrong.

Thanks in advance for the help.

Re: Availability on OMS

You're right, there is a bug in this query.

I think this new query is the most precise one, taking into account off-hours as needed (e.g. using only work day hours) and easy to change from an hourly-based to a minute-by-minute calculation (change the grain in the make-series command from 1h to 1d). In the below query, total_buckets is the total number of buckets in the time range (if you use 1h, it's the number of hours, if you use 1m, it's the number of minutes etc.) and available_in_buckets is the number of buckets during which the vm sent at least 1 heartbeat.

``````let start_time=startofday(datetime("2019-10-01"));  // UTC
let end_time=now();
Heartbeat
| make-series heartbeats_per_bucket=count() default=0 on TimeGenerated from start_time to end_time step 1h by Computer
| mv-expand heartbeats_per_bucket, TimeGenerated
| project BucketTimeGenerated=todatetime(TimeGenerated), Computer, heartbeats_per_bucket
| where BucketTimeGenerated >= start_time and BucketTimeGenerated <= end_time
| where dayofweek(BucketTimeGenerated) >= 1d and dayofweek(BucketTimeGenerated) <= 5d // Monday-Friday
| where hourofday(BucketTimeGenerated) >= 7 and hourofday(BucketTimeGenerated) <=17 // 7:00-17:59
| summarize total_buckets=count(), available_in_buckets=countif(heartbeats_per_bucket>0) by Computer
| project Computer, availability_rate=available_in_buckets*100/total_buckets``````

Re: Availability on OMS

@Noa Kuperberg Just curious if there is a way to find/calculate any windows service availability for a specific period using OMS query?

Re: Availability on OMS

@SatyaParida  Windows events are logged in the Event table, as far as I know, but I am not familiar with the data reported by each service and application to create the query you ask for. If you have more information on the logged data I can help with the query syntax.

Related Conversations
Tabs and Dark Mode
cjc2112 in Discussions on
48 Replies
Extentions Synchronization
Deleted in Discussions on
3 Replies
Stable version of Edge insider browser
HotCakeX in Discussions on
35 Replies
How to Prevent Teams from Auto-Launch
chenrylee in Microsoft Teams on
30 Replies
flashing a white screen while open new tab
Deleted in Discussions on
14 Replies
Security Community Webinars
Valon_Kolica in Security, Privacy & Compliance on
13 Replies