Jun 28 2021
05:06 AM
- last edited on
Apr 08 2022
10:50 AM
by
TechCommunityAP
Jun 28 2021
05:06 AM
- last edited on
Apr 08 2022
10:50 AM
by
TechCommunityAP
Hi ,
I need to trigger an alert if windows service is stopped in one of the node.
There are 2 nodes and service will be running in both nodes or at one node .
Only If service is not running in both the node then alert need to be triggered.
I'm using the below query and its not right. because alert will be triggered if the service is stopped in one of the node as the query fetches the latest record
let status =
Event
| where TimeGenerated > ago (1d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has "Apache tomcat"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer;
status
| where winstatus != 'running'
| project winstatus, Windows_Service_Name, Computer, TimeGenerated
The above query works well if there's only one VM but for multiple VM's it wont work.
I tried to count the result if service is stopped in both Vms and alert trigger if count value is 2 but then again in Event logs sometimes there will be only one result (if there's no change in state of event within the time frame that used in query) so this method will not work either.
sample result for
Event
| where TimeGenerated > ago (1d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has "Apache tomcat"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer;
6/28/2021, 2:01:55.930 AM | Apache Tomcat 8.5.58 | apacheNode1 | running | |
6/28/2021, 1:02:54.257 AM | Apache Tomcat 8.5.58 | apacheNode2 | running |
How to loop / check if all the rows that returned for winstatus are != 'running'.
Regards,
Racheal
Jun 28 2021 05:55 AM
Jun 28 2021 07:08 AM
@CliveWatson Thanks
This command is not clear to me
because I used,
| summarize anyif(winstatus !="stopped", true) --> returns false // . As per the query i think if status is not equal to stopped in any of the VM then returns true else returns false . this returns false because service is stopped in one of the VM
also checked
| summarize anyif(winstatus !="running", true) -> returns true// . As per the query i think if status is not equal to running in any of the VM then returns true else returns false . this returns true even though the service is running in one of the VM
Here's the VM service status
6/28/2021, 10:00:08.173 AM stopped apacheNode1
6/28/2021, 10:07:53.470 AM running apacheNode2
Modified query
let status =
Event
| where TimeGenerated > ago (1d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has "Apache"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer
| summarize status= anyif(winstatus != "stopped", true);
status
| where status == 'false'
| project status
Jun 29 2021 12:41 AM
Jun 29 2021 01:46 AM
Solution@Racheal2k I think you tried this before?
let status =
Event
| where TimeGenerated > ago (1d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has 'WMI Performance Adapter' //"Apache tomcat"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize count(), (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer;
status
| extend winstatus = iif(winstatus == 'running',1,0)
| summarize sumif(winstatus, winstatus > 0), ComputersOK = make_set_if(Computer, winstatus > 0), ComputerNotOk = make_set_if(Computer, winstatus == 0)
| extend ServiceStatus = iif(sumif_winstatus > 0, "The service is running"," The Service is not runnimg")
Jun 29 2021 04:34 AM
@CliveWatson , Thanks and that worked.
I have tried until
status
| extend winstatus = iif(winstatus == 'running',1,0) but haven't tried Sumif command :)
Great work ! thanks again
Regards,
Racheal
Jun 30 2021 08:37 AM
I'm using the below query to trigger alert .
let status =
Event
| where TimeGenerated > ago(30d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has "PowerCurve - Job Server"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer;
status
| extend winstatus = iif(winstatus == 'running', 1, 0)
| summarize sumif(winstatus, winstatus > 0), ComputersOK = make_set_if(Computer, winstatus > 0), ComputerNotOk = make_set_if(Computer, winstatus == 0)
| extend ServiceStatus = iif(sumif_winstatus > 0, "The service is running"," The Service is not running")
| where sumif_winstatus == 0
| project sumif_winstatus, ComputerNotOk, ComputersOK
if no. of result is > 0 then an alert will be triggered.
Am facing a weird issue here , if the service is running in one of the VM this query returns null in log analytics logs window which is perfect.
But i also receive alert that service is stopped and When i click view 1 results from the alert mail i received
it returns status as 0 which means service is stopped
but if i execute the query again by selecting it , it returns null.
I don't understand this behavior from Azure. The same query gives different result through alert and when it executed from log analytics log page it gives different answer.
Could you help with explaining this?
Regards,
Racheal
Jun 29 2021 01:46 AM
Solution@Racheal2k I think you tried this before?
let status =
Event
| where TimeGenerated > ago (1d)
| where EventLog == 'System' and EventID == 7036 and Source == 'Service Control Manager' and RenderedDescription has 'WMI Performance Adapter' //"Apache tomcat"
| parse kind=relaxed EventData with * '<Data Name="param1">' Windows_Service_Name '</Data><Data Name="param2">' Windows_Service_State '</Data>' *
| summarize count(), (TimeGenerated, winstatus) = arg_max(TimeGenerated, Windows_Service_State) by Windows_Service_Name, Computer;
status
| extend winstatus = iif(winstatus == 'running',1,0)
| summarize sumif(winstatus, winstatus > 0), ComputersOK = make_set_if(Computer, winstatus > 0), ComputerNotOk = make_set_if(Computer, winstatus == 0)
| extend ServiceStatus = iif(sumif_winstatus > 0, "The service is running"," The Service is not runnimg")