SOLVED

Reliably trigger alerts for Log Analytics log entries

Copper Contributor

MSDN documentation at https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alert-log-troubleshoot states: "To mitigate data ingestion delay, the system waits and retries the alert query multiple times if it finds the needed data is not yet ingested".

We have an issue with triggering alerts and the issue suggests that described behavior is not very reliable as a lot of our alerts aren't fired. To be more precise - we ingress logs from Data Factory V2 into Log Analytics and watch for log entries with Level == "Error", based on number of results greater that 0 (Period = Frequency = 30 minutes). We expect that in case a log entry with Level == "Error" is generated by Data Factory and ingested into Log Analytics we shall receive an alert, but very often we don't. We tried to change Period to larger values (30 minutes) leaving Frequency at 15 but in this case there is a big chance to receive duplicated alerts which also is not good. Are there any recommended and reliable Period/Frequency/Query configuration strategy that guarantees no alerts are missing and also does not produce duplicated alerts?

10 Replies

Hi,

My testing shows that when there is delay of data ingestion the alert is still fired up. Of course the alert is inheriting that delay but I haven't found missing alerts so far. May be you can share more about the experience you have: what kind of data source you use? when you have missing alerts have you compared ingested time with Time Generated for those events? What is your exact query?

The exact query is:

search *
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and (Level == "Error" or status_s == "Failed")
| order by TimeGenerated

 

Query is running over Log Analytics to which Data Factory V2 writes them (with several minutes delay, but it is hard to tell the exact numbers).

When I set Period = Frequency = 5 minutes then more than 50% of alert emails are missing, for Period = Frequency = 15 almost all logs relult in alert email, but not 100% all.

Except described issue there is a more severe issue, which may be related to the described one. When I navigate to Monitor -> Alerts I always see "All is good! You have no alerts." message which is really strange. I expect to see statistics about triggered alerts.

Because of this "You have no alerts." message it is hard to be sure that the issue is with alerts but not with emails (configured via Action Group). Our assumption was "there might be an issue with emails delivering, e.g. because of spam filters" but this assumption was dismissed after we configured Azure Function action type - azure functions are not invoked when emails are missing and are invoked when emails are delivered, so at least there is consistency with emails and Azure Function action types.

What may be the reason of "All is good! You have no alerts." message is always present?

Hi,

The first thing you should do is to never use search operator in alerts or any kind of saved queries. Search operator is usually used on discovering data initially but once you know where your data is you should stop using it. This is also described here:

https://azure.microsoft.com/en-us/blog/best-practices-for-queries-used-in-log-alerts-rules/

I am assuming that for Data factory you use diagnostic logs which are send to Log Analytics. In that case your data is in AzureDiagnostics table so your query should look like:

AzureDiagnostics
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and (Level == "Error" or status_s == "Failed")
| order by TimeGenerate

You can also skip | order by TimeGenerated when you use it in alert as it does not have any affect there.

Here is some information how ingestion time can be checked although I do not think that is the problem for you:

https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time

Keep in mind that in e-mail only 10 results will be added to the e-mail but if you go to the link of the alert query results you will see all the results.

The only reason why you are not seeing the alerts in Azure Monitor if you haven't selected the subscription of the where the Log Analytics wokrspace is located. Azure Monitor can display alerts from 5 subscription maximum.

I usually use metric measurement based alerts rather number of results type as that way I can get alert per instance. I have a small blog describing that scenario here:

https://cloudadministrator.net/2018/03/16/using-custom-log-search-alerts-based-on-metric-measurement...

Thanks for reply. I have one more question regarding "The only reason why you are not seeing the alerts in Azure Monitor if you haven't selected the subscription of the where the Log Analytics wokrspace is located". I have only one subscription to choose from, so there is no chance I can select the wrong one. Additional investigation revealed that my colleague can see alerts, but he is an Owner of the Subscription, while I am a Contributor of Resource Group where alerts are created. I can create alerts, change them, but cannot see which alerts were triggered. Unfortunately I cannot be granted any rights on global Subscription level, are there any way to configure a per Resource Group access so I'll be able to see alerts?

I think the alert instances itself are subscription level resource so you will need some access at subscription level not resource group. You do not need necessary to be Contributor at subscription level, for example you can be Monitoring contributor. You can also opt-in at creating your own custom role and have access to resources like Microsoft.AlertsManagement/alerts/ and Microsoft.AlertsManagement/alertsSummary/ at subscription level.

Stanislav, thank you a lot for your replies. Just for completeness I'd like to provide an update on my issue. After getting access to Alerts on a Subscription level we've realized that all alerts are actually triggered, so there are no bugs in the documentation, everything works fine even with 5 minutes Interval.

The actual but happened to be that triggered alerts have theit Action Group property empty, thats why emails are not sent. Action Group is not empty when viewing alert rules from Monitor -> Alerts -> "Manage alert rules" page, but it is empty when navigating to alert rule via link inside triggered alert instance. This definitely looks strange, we're already working on this issue with MS support. The initial investigation showed that problem might be with deploying Log Analytics alerts using ARM template. Whn we manuall create alert rule from the Portal all is fine, but when they are created with the help of ARM - Action Goup on triggered alerts is empty for some reason. The currently found workarounds:

1. After ARM deployment go to the Portal and manually re-save alert rules.

2. After ARM deployment use REST API to get and set alert actions (this is also just "re-save" with no modification).

The ARM is based on examples from page https://docs.microsoft.com/en-us/azure/azure-monitor/insights/solutions-resources-searches-alerts.

If the action group is not attached to the alert via ARM that means the action group was not referenced correctly. You have to make sure that the resource id of the action group is correct. I think there was also some bug that I've reported some time ago on the old Log Analytics alerts API where if the action group name contains white spaces the API cannot find the Action Group resource. The API also does not verifies if the action group exists so if it does not exist it will create the alert anyway. Workaround for that bug was to use name without white spaces so the resource ID can be correct or or to encode the name of the action group resource when you construct the resource id.

 

I see that you provide link to the old API so probably that bug still exists.

The weird thing is that action group seems to be attached when I create alert via ARM. At least I can see it on Monitor -> Alerts -> "Manage alert rules" page (image 1 in attached screenshot). Action group is only missing when looking at alert rule via link from triggered alert instance ("Alert rule" in the "Essentials" section, image 2 in attached screenshot).

A tried to get alert action JSON using REST API - reference to action group it is there. After re-saving an alert from the Portal or by get/put REST API calls nothing changes in action JSON (except etag), but somehow such re-save fixes the issue, so something internal is definitely changed.

 

Here goes sample request I used to get alert action:

$actionUrl = "/subscriptions/{subscription id}/resourceGroups/{res group name}/providers/Microsoft.OperationalInsights/workspaces/cdm1drepomsf01/savedSearches/saved_search60eee2d2dc0b42dd87cd0a06b1c3f335/schedules/schedule_60eee2d2dc0b42dd87cd0a06b1c3f335/actions/action_60eee2d2dc0b42dd87cd0a06b1c3f335?api-version=2015-03-20"
$jsonStr = armclient get $actionUrl

 

And here is what I used to re-save alert action via REST API:

$json = $jsonStr | ConvertFrom-Json
$json2 = @{
  etag=$json.etag
  properties=$json.properties
}
$json2 = $json2 | ConvertTo-Json -Depth 3
$json2 | armclient put $actionUrl

 

API samples may be found here: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/api-alerts.

 

You've mentioned that link points to an old API (it has "(Preview)" in title). Do you have a link to a new ARM API for Log Analytics alerts creation?

best response confirmed by Roman_Turovskyy (Copper Contributor)
Solution

Hi,

The new API is discussed here:

https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Log-Analytics-ARM-REST-API-specification-...

 

I haven't published examples on my blog as I try to avoid publishing things before they are are announced officially but I have been using the API for several weeks now. It had some bugs that I hope are fixed/or will be fixed before official release.

Stanislav, thank you very much! I just tried that new API and it works - emails are properly sent, action group does not disapper. Finally!

1 best response

Accepted Solutions
best response confirmed by Roman_Turovskyy (Copper Contributor)
Solution

Hi,

The new API is discussed here:

https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Log-Analytics-ARM-REST-API-specification-...

 

I haven't published examples on my blog as I try to avoid publishing things before they are are announced officially but I have been using the API for several weeks now. It had some bugs that I hope are fixed/or will be fixed before official release.

View solution in original post