Forum Discussion

GouravIN's avatar
GouravIN
Brass Contributor
Nov 21, 2019
Solved

Need Heartbeat Query

Hi Team,

 

I am trying to write a KQL query to catch if any single heartbeat missed.

Like we could see in my below screenshot, this server is sending heartbeat after every minute interval.

And now there is gap in heartbeat when i stopped the scx service, so now i want to track if any single heartbeat will miss then i should have an alert notification.

 

  • GouravIN 


    personally I prefer the example query of 

    // Availability rate
    // Calculate the availability rate of each connected computer
    Heartbeat
    // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
    | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
    | extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
    | summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
    | extend availabilityRate = totalAvailableHours*100.0/24

     

    Heartbeats are expected to be missed (pauses, glitches, load etc...) and the data will catch-up - so you may get false positives.

    You can use a date_diff to compare 
    Go to Log Analytics and Run Query

    Heartbeat
    | where TimeGenerated >= ago(1h)
    | where Computer == "hardening-demo"
    | project Computer, TimeGenerated
    | order by TimeGenerated desc
    | project n = TimeGenerated, nminus = prev(TimeGenerated), TimeGenerated, Computer
    | where isnotempty(nminus)
    // show time NOW vs time  n -1 row
    | extend second = datetime_diff('second',nminus, n)
    | where second >= 60



    Results for seconds below 60 (mainly 9 and 51 for the demo data) - just remove the last line of the above query to see this

    n nminus TimeGenerated Computer second
    2019-11-22T17:42:37.88Z 2019-11-22T17:42:46.523Z 2019-11-22T17:42:37.88Z hardening-demo 9
    2019-11-22T17:41:46.52Z 2019-11-22T17:42:37.88Z 2019-11-22T17:41:46.52Z hardening-demo 51
    2019-11-22T17:41:37.877Z 2019-11-22T17:41:46.52Z 2019-11-22T17:41:37.877Z hardening-demo 9
    2019-11-22T17:40:46.52Z 2019-11-22T17:41:37.877Z 2019-11-22T17:40:46.52Z hardening-demo 51
    2019-11-22T17:40:37.873Z 2019-11-22T17:40:46.52Z 2019-11-22T17:40:37.873Z hardening-demo 9

     

     

10 Replies

  • GouravIN 


    personally I prefer the example query of 

    // Availability rate
    // Calculate the availability rate of each connected computer
    Heartbeat
    // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
    | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
    | extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
    | summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
    | extend availabilityRate = totalAvailableHours*100.0/24

     

    Heartbeats are expected to be missed (pauses, glitches, load etc...) and the data will catch-up - so you may get false positives.

    You can use a date_diff to compare 
    Go to Log Analytics and Run Query

    Heartbeat
    | where TimeGenerated >= ago(1h)
    | where Computer == "hardening-demo"
    | project Computer, TimeGenerated
    | order by TimeGenerated desc
    | project n = TimeGenerated, nminus = prev(TimeGenerated), TimeGenerated, Computer
    | where isnotempty(nminus)
    // show time NOW vs time  n -1 row
    | extend second = datetime_diff('second',nminus, n)
    | where second >= 60



    Results for seconds below 60 (mainly 9 and 51 for the demo data) - just remove the last line of the above query to see this

    n nminus TimeGenerated Computer second
    2019-11-22T17:42:37.88Z 2019-11-22T17:42:46.523Z 2019-11-22T17:42:37.88Z hardening-demo 9
    2019-11-22T17:41:46.52Z 2019-11-22T17:42:37.88Z 2019-11-22T17:41:46.52Z hardening-demo 51
    2019-11-22T17:41:37.877Z 2019-11-22T17:41:46.52Z 2019-11-22T17:41:37.877Z hardening-demo 9
    2019-11-22T17:40:46.52Z 2019-11-22T17:41:37.877Z 2019-11-22T17:40:46.52Z hardening-demo 51
    2019-11-22T17:40:37.873Z 2019-11-22T17:40:46.52Z 2019-11-22T17:40:37.873Z hardening-demo 9

     

     

    • KevinNiko's avatar
      KevinNiko
      Copper Contributor

      CliveWatson 

       

      Data source: Azure

      Below query is based on Events which are registered and cleared after a while.

       

      Heartbeat
      // list records for last 30 days:
      | where TimeGenerated > ago(30d)
      | summarize LastCall = arg_max(TimeGenerated,*) by Computer
      // retrieve machines that have not sent a heartbeat in the last 4 hours:
      | where LastCall < ago(4h)

      | project Computer, LastCall, timestamp = format_datetime(LastCall, 'MM-dd-yyyy hh:mm'),
      startofday = format_datetime(startofday(now()), 'MM-dd-yyyy')
      ,VMUUID, SourceComputerId, ComputerIP, ResourceId
      | sort by LastCall

       

      The above KustoQL query runs every day, so today it may return 10 Computers (10 events), tomorrow 7 computers (7 events), etc.
      Want to keep records of all events (today, tomorrow, etc.) for 31 days.

       

      How to use Power BI or Power BI Dataflow to achieve above , 

      otherwise how do I get data from (Log Analytics workspaces) into Azure SQL Database and apply incremental refresh (so todays records don't overwrite yesterdays records) ?

      • Clive_Watson's avatar
        Clive_Watson
        Bronze Contributor

        KevinNiko

         

        You can store the data for longer (at a cost) by increasing retention.  You can also see the others days from within the query 

        Heartbeat
        | where TimeGenerated > startofday(ago(30d))
        | summarize count(), LastCall = max(TimeGenerated) by Computer, bin(TimeGenerated,1d)
        | where LastCall < ago(1m)
        | render columnchart 


         

    • ScottAllison's avatar
      ScottAllison
      Iron Contributor

      CliveWatson Just to add to this conversation, I've come up with a slightly different way of doing this--would love feedback:

      let current = now();
      let ostype = 'Windows';
      let computername = '';
      let environment = 'Non-Azure';
      let threshold = 600;
      Heartbeat
      | where TimeGenerated >= ago(1h)
      // --for a specific computer:
      | where Computer contains computername
      // --for a specific computer group:
      //| where Computer in (group)
      // --for a specific OS type:
      | where OSType contains ostype
      // --for on-prem or Azure VMs:
      | where ComputerEnvironment contains environment
      | project Computer, TimeGenerated, current
      | order by TimeGenerated desc
      | project nminus = prev(TimeGenerated), current, Computer
      | where isnotempty(nminus)
      | extend ['LastHeartbeat (in seconds)'] = datetime_diff('second', current, nminus)
      | summarize arg_max(nminus, *) by Computer
      | where ['LastHeartbeat (in seconds)'] >= threshold
      | project Computer, QueryTime = current, LastTimeStamp = nminus, ['LastHeartbeat (in seconds)']

       

Resources