SOLVED

Availability on OMS

Brass Contributor

Hi everyone.

I'm trying to find a way of getting Availability of servers on OMS, but I can't find any...

By Availability I mean the % of uptime of a given server during a certain period of time.

So, if a server was up 98 of a total 100 hours, the availability for that period is 98%.

I'm looking to do that in OMS, but I'm not sure it's possible.

Thanks in advance.

36 Replies

Hi Prashant,

 

We can get data by doing amendment in dates. But make sure you have data retention policy for last year to save data.

 

You can check here:-

OMS data retention.PNG

Hi Gaurav,

 

I got it, 

Thank You

Love this query. I'm having trouble modifying it to meet my needs.

 

In addition to what this query provides, I'd also like to show the last TimeGenerated for each Computer. I can't seem to get the logic to work correctly. Any help is appreciated.

let start_time=startofday(ago(30d));
let end_time=startofday(now());
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets
| order by availability_rate desc

Awesome Script Thanks 

Does anyone have a way to restrict the script to pull heartbeats Mon-Fri and 7am-6pm? I keep getting errors. This is what I have:

let start_time=startofday(datetime("2018-06-1 07:30:00"));
let end_time=endofday(datetime("2018-06-30 18:00:00"));
Heartbeat
| where TimeGenerated > start_time and TimeGenerated < end_time
| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer
| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)
| summarize total_available_hours=countif(available_per_hour==true) by Computer
| extend total_number_of_buckets=round((end_time-start_time)/1h)
| extend availability_rate=total_available_hours*100/total_number_of_buckets
I have used the query, there are some differences which is not reflecting for us in the log analytics.

We are doing the Monthly patches for the Virtual Machines.So when i tried this query it is showing the availability rate as 100% percentage.
But we are rebooting the server after the patching activity, the Availability report percentage should differ but it is showing 100% percent for all the servers.

Can anyone help us on this ? how can we get the exact report ?

@MaheshG 

Hi! I've update the above query to reflect your week days and hours (Mon-Fri, 07:00-17:59).

Also, the above query considers every hour in which there was even 1 heartbeat as "up time" (available), so this is probably the part you want to tweak. The resolution here depends on your agent. If it reports a heartbeat every 5 minutes, you can do this:

 

 

let start_time=startofday(datetime("2019-08-01 07:00:00"));
let end_time=endofday(datetime("2019-08-30 18:00:00"));
Heartbeat
| where TimeGenerated >= start_time and TimeGenerated <= end_time
| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d  // Monday-Friday
| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17   // 7:00-17:59
| summarize heartbeat_per_5_minutes=count() by bin_at(TimeGenerated, 5m, start_time), Computer
| extend available_per_5_min=iff(heartbeat_per_5_minutes>0, true, false)
| summarize total_available_buckets=countif(available_per_5_min==true)
, total_unavailable_buckets=countif(available_per_5_min==false) by Computer
| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)
| extend availability_rate=total_available_buckets*100/total_number_of_buckets

 

 

Note than anyway, if the reboot was quick and the agent sends a heartbeat every 5 minutes, it might go unnoticed.

 

HTH,

Noa

Hello Noa,
Can we get the output in the form of graph and chart on the Azure dashboard?

@Prashant Sharma 

 

Just add a last line of 

 

| render barchart kind=unstacked 

 

 

or if you want less data, pick the columns, using project:

 

| project Computer, availability_rate  
| render barchart kind=unstacked title = "Availability Rate per Computer"

 

 

Go to Log Analytics and Run Query

clipboard_image_0.png

This question for @CliveWatson@Noa Kuperberg 

 

I have used queries and in my workspace and got some discrepancies.

 

First, I used below one.

let start_time=startofday(datetime("2019-09-01 00:00:00"));

let end_time=endofday(datetime("2019-09-27 00:00:00"));

Heartbeat

| where TimeGenerated >= start_time and TimeGenerated <= end_time

| where dayofweek(TimeGenerated) >= 1d and dayofweek(TimeGenerated) <= 5d // Monday-Friday

| where hourofday(TimeGenerated) >= 7 and hourofday(TimeGenerated) <=17 // 7:00-17:59

| summarize heartbeat_per_1_minutes=count() by bin_at(TimeGenerated, 1m, start_time), Computer

| extend available_per_1_min=iff(heartbeat_per_1_minutes>0, true, false)

| summarize total_available_buckets=countif(available_per_1_min==true)

, total_unavailable_buckets=countif(available_per_1_min==false) by Computer

| extend total_number_of_buckets=round(total_available_buckets+total_unavailable_buckets)

| extend availability_rate=total_available_buckets*100/total_number_of_buckets

| where Computer == "vm83560609d9"

 

And got result like this: -

 

KQL_AVAIL_1.JPG

 

Now concern is why availability_rate for this VM is 100 for September 1st to 27th , when it is no longer available after few days.

And what is total number of bucket and total unavailable bucket?

 

And on the other hand when i used second query, below one: -

 

let start_time=startofday(datetime("2019-09-01"));

let end_time=endofday(datetime("2019-09-27"));

Heartbeat

| where TimeGenerated > start_time and TimeGenerated < end_time

| summarize heartbeat_per_hour=count() by bin_at(TimeGenerated, 1h, start_time), Computer

| extend available_per_hour=iff(heartbeat_per_hour>0, true, false)

| summarize total_available_hours=countif(available_per_hour==true) by Computer

| extend total_number_of_buckets=round((end_time-start_time)/1h)

| extend availability_rate=total_available_hours*100/total_number_of_buckets

| where Computer == "vm83560609d9"

 

Seems, I am getting right results: -

KQL_AVAIL_2.JPG

 

And in this section, why total number of bucket is too high (648). Same question again what is it?

And total number hours means "the availability of system", if I am not wrong.

 

Thanks in advance for the help.

@GouravIN 

You're right, there is a bug in this query. 

I think this new query is the most precise one, taking into account off-hours as needed (e.g. using only work day hours) and easy to change from an hourly-based to a minute-by-minute calculation (change the grain in the make-series command from 1h to 1d). In the below query, total_buckets is the total number of buckets in the time range (if you use 1h, it's the number of hours, if you use 1m, it's the number of minutes etc.) and available_in_buckets is the number of buckets during which the vm sent at least 1 heartbeat.

 

let start_time=startofday(datetime("2019-10-01"));  // UTC
let end_time=now();
Heartbeat
| make-series heartbeats_per_bucket=count() default=0 on TimeGenerated from start_time to end_time step 1h by Computer
| mv-expand heartbeats_per_bucket, TimeGenerated
| project BucketTimeGenerated=todatetime(TimeGenerated), Computer, heartbeats_per_bucket
| where BucketTimeGenerated >= start_time and BucketTimeGenerated <= end_time
| where dayofweek(BucketTimeGenerated) >= 1d and dayofweek(BucketTimeGenerated) <= 5d // Monday-Friday
| where hourofday(BucketTimeGenerated) >= 7 and hourofday(BucketTimeGenerated) <=17 // 7:00-17:59
| summarize total_buckets=count(), available_in_buckets=countif(heartbeats_per_bucket>0) by Computer
| project Computer, availability_rate=available_in_buckets*100/total_buckets

 

@Noa Kuperberg Just curious if there is a way to find/calculate any windows service availability for a specific period using OMS query? 

@SatyaParida  Windows events are logged in the Event table, as far as I know, but I am not familiar with the data reported by each service and application to create the query you ask for. If you have more information on the logged data I can help with the query syntax.

@Noa Kuperberg @GouravIN 

 

Hi Guys,

 

I wanted to take a report for a list of Azure VMs the 24/7 availability report from the date of creation,

Can you please tweak the script and help me to get the report?

@snataraj1984 

 

Do you mean the last query above?  Do you mean creation time of the computer (we'd only see that if you keep the logs from that time and its less than two years - max retention value in Log Analytics)

 

let start_time=startofday(ago(360d));  // UTC
let end_time=now();
Heartbeat
| make-series heartbeats_per_bucket=count() default=0 on TimeGenerated from start_time to end_time step 1h by Computer
| mv-expand heartbeats_per_bucket, TimeGenerated
| project BucketTimeGenerated=todatetime(TimeGenerated), Computer, heartbeats_per_bucket
| where BucketTimeGenerated >= start_time and BucketTimeGenerated <= end_time
| where dayofweek(BucketTimeGenerated) >= 1d and dayofweek(BucketTimeGenerated) <= 5d // Monday-Friday
| where hourofday(BucketTimeGenerated) >= 7  and hourofday(BucketTimeGenerated) <=17  // 7:00-17:59
| summarize total_buckets=count(), available_in_buckets=countif(heartbeats_per_bucket>0),  min(BucketTimeGenerated), max(BucketTimeGenerated) by Computer
| project Computer, availability_rate=available_in_buckets*100/total_buckets

You can adjust line #1 to go back a number of days (above I show 1 year (365days)) 

@CliveWatson 

 

Thanks Clive, 

 

I will explain what i am trying to find

 

1. Finding creation Time/Date for a list of Azure VMs

2. Find the availability of those VMs from the date of creation if the log is overwritten we will consider the available logs

3. Finding a way to get these results in a report resource group wise/ Windows machines / Linux machines

 

Do we already have a query to get this info by any chance?