Alerting when data are missing

T150732D · ‎Aug 10 2022

Hello,

I would like to have incident if there is a gap in ingested data for key build in sentinel data connectors or custom integration for lets say 1 hour or more.

For commonsecurity log which are our CEF I was thinking of something like this which shows last data received.

Would similar be applicable for data connectors? How do you monitor data ingestion? Our management expects if there is delay in logs follow up with data source owners. Thank you

let Sources = dynamic(["Incapsula", "Cyber-Ark", "ArcSight"]);

CommonSecurityLog

| where isnotempty(DeviceVendor) and DeviceVendor !in (Sources)

| where (DeviceVendor == '{selectedDeviceVendor}' or '{selectedDeviceVendor}' == "All") and (DeviceProduct == '{selectedDeviceProduct}' or '{selectedDeviceProduct}' == "All")

| summarize LastLogReceived = arg_max(TimeGenerated, *) by DeviceVendor, DeviceProduct

| extend HeartBeatMessage = iff(datetime_diff('second',now() ,LastLogReceived) > 3600, strcat("Not active since ",datetime_diff('second',now() ,LastLogReceived)*1s, ' hours ago') ,"Active Logs Received")

| extend Heartbeat =datetime_diff('second',now() ,LastLogReceived)

| project DeviceVendor, DeviceProduct, Heartbeat,HeartBeatMessage

Clive_Watson · ‎Aug 10 2022

Answered here: https://techcommunity.microsoft.com/t5/microsoft-sentinel/sentinel-connector-notification/m-p/359580...

T150732D · ‎Aug 10 2022

thanks, i saw the post for data connectors, but is there any recommended way how to monitor custom ingestion? we had incident and we missed some data from cef source. was hoping microsoft will have some formal advice as even 15 min loss of CEF data could represent some serious problem from compliance perspective. we were not able to investigate several incidents already

T150732D · ‎Aug 11 2022

This gets better with ASIM (but not all Network vendors are covered yet.) https://docs.microsoft.com/en-us/azure/sentinel/network-normalization-schema

1. You can use the methods above to see if the whole CommonSecurityLog table received anything within a time period like 15mins - this tends to work if you only have one or two sending systems. So you only get a full failure, rather than one sending host of two has failed. This still can be a good rule to have (maybe also do the same test for Syslog as well as CEF if you have that). A one hour threshold might be a good safe value, to reduce false positives but it does mean the whole solution could have been down for up to 59mins.

2. What you will also need, and this is preferred (I think), is to monitor each sending device (and this is where ASIM helps identify what those devices are, in the product you are using you need to find something like the sending computer name / IP - this is often in AdditionalExtensions column and you need to parse it out to get the device name or IP.
It can take some work to find the sending source rather than the CEF server receiving the data (which is often the Computer column), and this often varies per CEF vendor).

You then need to check each of these Devices or IP's for ingestion delays past 15mins.

3. Another check could also be to reference the Heartbeat table for the agents and check when the agent last did a heartbeat. You might union the results with CEF, in case the agents is reported down but still actually working (unlikely but it could happen)

Summary:
Unfortunately this is complex and without full normalization (ASIM) or SentinelHealth supported for all Tables there are gaps

T150732D · ‎Aug 11 2022

This gets better with ASIM (but not all Network vendors are covered yet.) https://docs.microsoft.com/en-us/azure/sentinel/network-normalization-schema

1. You can use the methods above to see if the whole CommonSecurityLog table received anything within a time period like 15mins - this tends to work if you only have one or two sending systems. So you only get a full failure, rather than one sending host of two has failed. This still can be a good rule to have (maybe also do the same test for Syslog as well as CEF if you have that). A one hour threshold might be a good safe value, to reduce false positives but it does mean the whole solution could have been down for up to 59mins.

2. What you will also need, and this is preferred (I think), is to monitor each sending device (and this is where ASIM helps identify what those devices are, in the product you are using you need to find something like the sending computer name / IP - this is often in AdditionalExtensions column and you need to parse it out to get the device name or IP.
It can take some work to find the sending source rather than the CEF server receiving the data (which is often the Computer column), and this often varies per CEF vendor).

You then need to check each of these Devices or IP's for ingestion delays past 15mins.

3. Another check could also be to reference the Heartbeat table for the agents and check when the agent last did a heartbeat. You might union the results with CEF, in case the agents is reported down but still actually working (unlikely but it could happen)

Summary:
Unfortunately this is complex and without full normalization (ASIM) or SentinelHealth supported for all Tables there are gaps

View solution in original post

Alerting when data are missing

Alerting when data are missing

Re: Alerting when data are missing

Re: Alerting when data are missing

Re: Alerting when data are missing

Re: Alerting when data are missing

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Alerting when data are missing