SOLVED

Need Heartbeat Query

%3CLINGO-SUB%20id%3D%22lingo-sub-1024195%22%20slang%3D%22en-US%22%3ENeed%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1024195%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Team%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20am%20trying%20to%20write%20a%20KQL%20query%20to%20catch%20if%20any%20single%20heartbeat%20missed.%3C%2FP%3E%3CP%3ELike%20we%20could%20see%20in%20my%20below%20screenshot%2C%20this%20server%20is%20sending%20heartbeat%20after%20every%20minute%20interval.%3C%2FP%3E%3CP%3EAnd%20now%20there%20is%20gap%20in%20heartbeat%20when%20i%20stopped%20the%20scx%20service%2C%20so%20now%20i%20want%20to%20track%20if%20any%20single%20heartbeat%20will%20miss%20then%20i%20should%20have%20an%20alert%20notification.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20561px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F158327i8DA14AAE8242B943%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20alt%3D%22OMS_Question_21112019.JPG%22%20title%3D%22OMS_Question_21112019.JPG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1024195%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Log%20Analytics%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1026499%22%20slang%3D%22en-US%22%3ERe%3A%20Need%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1026499%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F184511%22%20target%3D%22_blank%22%3E%40Gourav%20Kumar%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CBR%20%2F%3Epersonally%20I%20prefer%20the%20example%20query%20of%26nbsp%3B%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3E%2F%2F%20Availability%20rate%0A%2F%2F%20Calculate%20the%20availability%20rate%20of%20each%20connected%20computer%0AHeartbeat%0A%2F%2F%20bin_at%20is%20used%20to%20set%20the%20time%20grain%20to%201%20hour%2C%20starting%20exactly%2024%20hours%20ago%0A%7C%20summarize%20heartbeatPerHour%20%3D%20count()%20by%20bin_at(TimeGenerated%2C%201h%2C%20ago(24h))%2C%20Computer%0A%7C%20extend%20availablePerHour%20%3D%20iff(heartbeatPerHour%20%26gt%3B%200%2C%20true%2C%20false)%0A%7C%20summarize%20totalAvailableHours%20%3D%20countif(availablePerHour%20%3D%3D%20true)%20by%20Computer%20%0A%7C%20extend%20availabilityRate%20%3D%20totalAvailableHours*100.0%2F24%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EHeartbeats%20are%20expected%20to%20be%20missed%20(pauses%2C%20glitches%2C%20load%20etc...)%20and%20the%20data%20will%20catch-up%20-%20so%20you%20may%20get%20false%20positives.%3CBR%20%2F%3E%3CBR%20%2F%3EYou%20can%20use%20a%20date_diff%20to%20compare%26nbsp%3B%3CBR%20%2F%3E%3CA%20href%3D%22https%3A%2F%2Fms.portal.azure.com%23%4072f988bf-86f1-41af-91ab-2d7cd011db47%2Fblade%2FMicrosoft_Azure_Monitoring_Logs%2FDemoLogsBlade%2FresourceId%2F%252FDemo%2Fsource%2FLogsBlade.AnalyticsShareLinkToQuery%2Fq%2FH4sIAAAAAAAAA12QPU%25252FDMBCG90r9D6cuTaVUpQubWRhgggWJEbnx28ZI9kXna0MlfjwXBWjTydb5eT9889kzvOgOXuezb%25252BpbCOgtJjwhQ7wi0IMjf%25252BBq264uxCOn7qgQco4WrZeAHPNhHZB4MVCd8Cca%25252FefqqeeAsImEduebtIDSXDtkclOippxiPhabd4JTNXlc1bfwX4NL91gyK1Kn52p0sn9tNlRa7klNTC%25252Bv73Qq493y11sS7gc9vhQ5UEHDdjgKFjFQHyHu99VynC%25252Fr0dV6Xi3sV2O7vL%25252F7ARx0eyF0AQAA%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3EGo%20to%20Log%20Analytics%20and%20Run%20Query%3C%2FA%3E%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3EHeartbeat%0A%7C%20where%20TimeGenerated%20%26gt%3B%3D%20ago(1h)%0A%7C%20where%20Computer%20%3D%3D%20%22hardening-demo%22%0A%7C%20project%20Computer%2C%20TimeGenerated%0A%7C%20order%20by%20TimeGenerated%20desc%0A%7C%20project%20n%20%3D%20TimeGenerated%2C%20nminus%20%3D%20prev(TimeGenerated)%2C%20TimeGenerated%2C%20Computer%0A%7C%20where%20isnotempty(nminus)%0A%2F%2F%20show%20time%20NOW%20vs%20time%20%20n%20-1%20row%0A%7C%20extend%20second%20%3D%20datetime_diff('second'%2Cnminus%2C%20n)%0A%7C%20where%20second%20%26gt%3B%3D%2060%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%0A%3CP%3EResults%20for%20seconds%20below%2060%20(mainly%209%20and%2051%20for%20the%20demo%20data)%20-%20just%20remove%20the%20last%20line%20of%20the%20above%20query%20to%20see%20this%3C%2FP%3E%0A%3CDIV%3E%0A%3CTABLE%20cellspacing%3D%221%22%20cellpadding%3D%225%22%3E%0A%3CTBODY%3E%0A%3CTR%3E%0A%3CTH%3En%3C%2FTH%3E%0A%3CTH%3Enminus%3C%2FTH%3E%0A%3CTH%3ETimeGenerated%3C%2FTH%3E%0A%3CTH%3EComputer%3C%2FTH%3E%0A%3CTH%3Esecond%3C%2FTH%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E2019-11-22T17%3A42%3A37.88Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A42%3A46.523Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A42%3A37.88Z%3C%2FTD%3E%0A%3CTD%3Ehardening-demo%3C%2FTD%3E%0A%3CTD%3E9%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E2019-11-22T17%3A41%3A46.52Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A42%3A37.88Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A41%3A46.52Z%3C%2FTD%3E%0A%3CTD%3Ehardening-demo%3C%2FTD%3E%0A%3CTD%3E51%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E2019-11-22T17%3A41%3A37.877Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A41%3A46.52Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A41%3A37.877Z%3C%2FTD%3E%0A%3CTD%3Ehardening-demo%3C%2FTD%3E%0A%3CTD%3E9%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E2019-11-22T17%3A40%3A46.52Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A41%3A37.877Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A40%3A46.52Z%3C%2FTD%3E%0A%3CTD%3Ehardening-demo%3C%2FTD%3E%0A%3CTD%3E51%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3CTR%3E%0A%3CTD%3E2019-11-22T17%3A40%3A37.873Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A40%3A46.52Z%3C%2FTD%3E%0A%3CTD%3E2019-11-22T17%3A40%3A37.873Z%3C%2FTD%3E%0A%3CTD%3Ehardening-demo%3C%2FTD%3E%0A%3CTD%3E9%3C%2FTD%3E%0A%3C%2FTR%3E%0A%3C%2FTBODY%3E%0A%3C%2FTABLE%3E%0A%3C%2FDIV%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1133531%22%20slang%3D%22en-US%22%3ERe%3A%20Need%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1133531%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F239477%22%20target%3D%22_blank%22%3E%40Clive%20Watson%3C%2FA%3E%26nbsp%3BJust%20to%20add%20to%20this%20conversation%2C%20I've%20come%20up%20with%20a%20slightly%20different%20way%20of%20doing%20this--would%20love%20feedback%3A%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3Elet%20current%20%3D%20now()%3B%0Alet%20ostype%20%3D%20'Windows'%3B%0Alet%20computername%20%3D%20''%3B%0Alet%20environment%20%3D%20'Non-Azure'%3B%0Alet%20threshold%20%3D%20600%3B%0AHeartbeat%0A%7C%20where%20TimeGenerated%20%26gt%3B%3D%20ago(1h)%0A%2F%2F%20--for%20a%20specific%20computer%3A%0A%7C%20where%20Computer%20contains%20computername%0A%2F%2F%20--for%20a%20specific%20computer%20group%3A%0A%2F%2F%7C%20where%20Computer%20in%20(group)%0A%2F%2F%20--for%20a%20specific%20OS%20type%3A%0A%7C%20where%20OSType%20contains%20ostype%0A%2F%2F%20--for%20on-prem%20or%20Azure%20VMs%3A%0A%7C%20where%20ComputerEnvironment%20contains%20environment%0A%7C%20project%20Computer%2C%20TimeGenerated%2C%20current%0A%7C%20order%20by%20TimeGenerated%20desc%0A%7C%20project%20nminus%20%3D%20prev(TimeGenerated)%2C%20current%2C%20Computer%0A%7C%20where%20isnotempty(nminus)%0A%7C%20extend%20%5B'LastHeartbeat%20(in%20seconds)'%5D%20%3D%20datetime_diff('second'%2C%20current%2C%20nminus)%0A%7C%20summarize%20arg_max(nminus%2C%20*)%20by%20Computer%0A%7C%20where%20%5B'LastHeartbeat%20(in%20seconds)'%5D%20%26gt%3B%3D%20threshold%0A%7C%20project%20Computer%2C%20QueryTime%20%3D%20current%2C%20LastTimeStamp%20%3D%20nminus%2C%20%5B'LastHeartbeat%20(in%20seconds)'%5D%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1138027%22%20slang%3D%22en-US%22%3ERe%3A%20Need%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1138027%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F239477%22%20target%3D%22_blank%22%3E%40Clive%20Watson%3C%2FA%3E%26nbsp%3BThanks!%20I've%20seen%20weirdness%20with%26nbsp%3B%3CSTRONG%3Ehas%3C%2FSTRONG%3E%20versus%26nbsp%3B%3CSTRONG%3Econtains%3C%2FSTRONG%3E.%20I%20haven't%20noted%20what%20that%20weirdness%20is%2C%20but%20if%20I%20run%20across%20it%20again%2C%20I'll%20be%20sure%20to%20share.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1137794%22%20slang%3D%22en-US%22%3ERe%3A%20Need%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1137794%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F151992%22%20target%3D%22_blank%22%3E%40Scott%20Allison%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ELooks%20good%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F151992%22%20target%3D%22_blank%22%3E%40Scott%20Allison%3C%2FA%3E%26nbsp%3B%2C%20I%20would%20just%20swap%20%3CSTRONG%3Econtains%3C%2FSTRONG%3E%20to%20%3CSTRONG%3Ehas%3C%2FSTRONG%3E%20as%20per%20best%20practise%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fkusto%2Fquery%2Fbest-practices%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fkusto%2Fquery%2Fbest-practices%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1148621%22%20slang%3D%22en-US%22%3ERe%3A%20Need%20Heartbeat%20Query%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1148621%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F239477%22%20target%3D%22_blank%22%3E%40Clive%20Watson%3C%2FA%3E%26nbsp%3BDefinitely%20makes%20sense.%20Today%2C%20I%20don't%20have%20but%20a%20few%20use%20cases%20to%20use%20HAS%20(querying%20Event%20Logs%20or%20Syslog%20comes%20to%20mind).%20Your%20explanation%20clears%20things%20up%20for%20me.%26nbsp%3B%3CBR%20%2F%3E%3CBR%20%2F%3EAppreciate%20it!%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
Contributor

Hi Team,

 

I am trying to write a KQL query to catch if any single heartbeat missed.

Like we could see in my below screenshot, this server is sending heartbeat after every minute interval.

And now there is gap in heartbeat when i stopped the scx service, so now i want to track if any single heartbeat will miss then i should have an alert notification.

 

OMS_Question_21112019.JPG

7 Replies
Highlighted
Solution

@Gourav Kumar 


personally I prefer the example query of 

// Availability rate
// Calculate the availability rate of each connected computer
Heartbeat
// bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
| summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
| extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
| summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
| extend availabilityRate = totalAvailableHours*100.0/24

 

Heartbeats are expected to be missed (pauses, glitches, load etc...) and the data will catch-up - so you may get false positives.

You can use a date_diff to compare 
Go to Log Analytics and Run Query

Heartbeat
| where TimeGenerated >= ago(1h)
| where Computer == "hardening-demo"
| project Computer, TimeGenerated
| order by TimeGenerated desc
| project n = TimeGenerated, nminus = prev(TimeGenerated), TimeGenerated, Computer
| where isnotempty(nminus)
// show time NOW vs time  n -1 row
| extend second = datetime_diff('second',nminus, n)
| where second >= 60



Results for seconds below 60 (mainly 9 and 51 for the demo data) - just remove the last line of the above query to see this

n nminus TimeGenerated Computer second
2019-11-22T17:42:37.88Z 2019-11-22T17:42:46.523Z 2019-11-22T17:42:37.88Z hardening-demo 9
2019-11-22T17:41:46.52Z 2019-11-22T17:42:37.88Z 2019-11-22T17:41:46.52Z hardening-demo 51
2019-11-22T17:41:37.877Z 2019-11-22T17:41:46.52Z 2019-11-22T17:41:37.877Z hardening-demo 9
2019-11-22T17:40:46.52Z 2019-11-22T17:41:37.877Z 2019-11-22T17:40:46.52Z hardening-demo 51
2019-11-22T17:40:37.873Z 2019-11-22T17:40:46.52Z 2019-11-22T17:40:37.873Z hardening-demo 9

 

 

Highlighted

@Clive Watson Just to add to this conversation, I've come up with a slightly different way of doing this--would love feedback:

let current = now();
let ostype = 'Windows';
let computername = '';
let environment = 'Non-Azure';
let threshold = 600;
Heartbeat
| where TimeGenerated >= ago(1h)
// --for a specific computer:
| where Computer contains computername
// --for a specific computer group:
//| where Computer in (group)
// --for a specific OS type:
| where OSType contains ostype
// --for on-prem or Azure VMs:
| where ComputerEnvironment contains environment
| project Computer, TimeGenerated, current
| order by TimeGenerated desc
| project nminus = prev(TimeGenerated), current, Computer
| where isnotempty(nminus)
| extend ['LastHeartbeat (in seconds)'] = datetime_diff('second', current, nminus)
| summarize arg_max(nminus, *) by Computer
| where ['LastHeartbeat (in seconds)'] >= threshold
| project Computer, QueryTime = current, LastTimeStamp = nminus, ['LastHeartbeat (in seconds)']

 

Highlighted

@Scott Allison 

 

Looks good @Scott Allison , I would just swap contains to has as per best practise https://docs.microsoft.com/en-us/azure/kusto/query/best-practices 

Highlighted

@Clive Watson Thanks! I've seen weirdness with has versus contains. I haven't noted what that weirdness is, but if I run across it again, I'll be sure to share.

Highlighted

@Clive Watson - here's a perfect example of why the HAS operator isn't useful for many operations:

This query returns the expected results every time:

Heartbeat
| where Computer contains 'abc'
| distinct Computer

For example, this would return:
SERVERABC1
SERVERABC2
COMPUTERABC24

When I replace CONTAINS with HAS, I get 0 results. So in 99% of my use cases, HAS doesn't work at all. 

Highlighted

@Scott Allison 

 

That is the behavior I'd expect 

 

From the docs: 
Prefer has operator over contains when looking for full tokens. has is more performant as it doesn't have to look-up for substrings.

 

What does that mean in practice:

 

1. This query example will fail (as its not a substring).  Computers named: aks-nodepool1.nnnnnnnnn

 

Go to Log Analytics and run query

Heartbeat | where Computer has 'pool' | distinct Computer
 
Note: if you used "nodepool1" it would work
 
Where as this works ("aks" is a full string match(full token)
Go to Log Analytics and run query

Computer
aks-nodepool1-25494468-4
aks-nodepool1-25494468-1
 
So on a small dataset it wont matter if you use contains vs. has - however on a large one it could improve perf. 
When I create a query I will often start with a "contains" but will (If I remember!  Sorry if you find a query from me that isn't optomised) then check and swap to a "has" if that works - but you need to evaluate on a case by case.
 
Essentially KQL only scans relevant data using indices with has, rather than have to read ALL the data (imagine if the string was really really long, rather than a simple computer name). 
Its sort of like a book, and the index, would you want to find out in what chapters an occurrence of "Scott" was by checking the index, or have to read each line, word and words within a word?
 
Make sense?
 
 
 
Highlighted

@Clive Watson Definitely makes sense. Today, I don't have but a few use cases to use HAS (querying Event Logs or Syslog comes to mind). Your explanation clears things up for me. 

Appreciate it!