AZURE SQL DB AND LOG ANALYTICS BETTER TOGETHER – PART #2 – ALERTS

Microsoft

Aug 08, 2019

As you could see on previous post (AZURE SQL DB AND LOG ANALYTICS BETTER TOGETHER – PART #1) using Azure SQL DB + Log Analytics is quite easy.

Let see how can we improve even further with ALERTS

Let’s imagine I want to receive everyday a list of all failed logins to be reviewed

I will reuse a query from last post to get failed logins, and I will just change parameter of time to filter failed logins on last 1 day

//Failed Logins
AzureDiagnostics
| where TimeGenerated >= ago(1d) //Events on last 1 days
| where action_name_s == "DATABASE AUTHENTICATION FAILED"
| extend additional_information_xml=parse_xml(additional_information_s)
| extend Error_Code=additional_information_xml.login_information.error_code
| extend Error_State=additional_information_xml.login_information.error_state
| project
    TimeGenerated, event_time_t,
    ResourceGroup, LogicalServerName_s, database_name_s,
    session_id_d, client_ip_s, application_name_s, session_server_principal_name_s,
    Error_Code, Error_State
| order by TimeGenerated desc

You can see that at this moment I don’t have any error (0 records).

To create an Alert based on a query, just click on New alert rule

1 – CONDITIONS

Resource is already filled with LogAnalytics

Now we need to define in which condition it will fire. Just click on CONDITION

You can see the search query was prepopulated, you just need to set threshold value, period and frequency.

Time Period. Specifies the time range for the query. The query returns only records that were created within this range of the current time. Time period restricts the data fetched for log query to prevent abuse and circumvents any time command (like ago) used in log query.
For example, If the time period is set to 60 minutes, and the query is run at 1:15 PM, only records created between 12:15 PM and 1:15 PM is returned to execute log query. Now if the log query uses time command like ago (7d), the log query would be run only for data between 12:15 PM and 1:15 PM – as if data exists for only the past 60 minutes. And not for seven days of data as specified in log query.

Frequency. Specifies how often the query should be run. Can be any value between 5 minutes and 24 hours. Should be equal to or less than the time period. If the value is greater than the time period, then you risk records being missed.
For example, consider a time period of 30 minutes and a frequency of 60 minutes. If the query is run at 1:00, it returns records between 12:30 and 1:00 PM. The next time the query would run is 2:00 when it would return records between 1:30 and 2:00. Any records created between 1:00 and 1:30 would never be evaluated.

Threshold. The results of the log search are evaluated to determine whether an alert should be created. The threshold is different for the different types of log search alert rules.

REF: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-unified-log#log-search-alert-rule—definition-and-types

2 – ACTION GROUPS

Actions group define what are you going to do if the condition above is true. You can:

Send an email
Send an SMS
Push a notification to Azure App
Receive a call
Or you can do some automation, using a runbookon Automation accounts
Call Azure Functions/ Logic Apps
Webhook actions allow you to invoke an external process through a single HTTP POST request.
ITSM(IT Service Management) action will allow users to create a work item in the connected ITSM tool when an alert is fired.

In this sample I will send and email

You can also customize Subject Line and customize the JSON payload for webhook if that was the case

At end you can set the alert severity, and suppress alert config

Alert severity– For this test I’ve used Sev 2 Informational for my scenario
Suppress alert config– If you don’t want the alert to be a SPAM on your mailbox, change the period here

3 – ALERT EMAIL

For this test I just set parameters: Period 30 min, frequency 5 min and Suppress alerts 30 min.Find below the email that I have received

On this tests we can see 2 errors

18456 / 8 = Login failed for user ‘%.*ls’.%.*ls / The password is incorrect.
18456 / 5 = Login failed for user ‘%.*ls’.%.*ls / User ID is not valid.

REF: https://docs.microsoft.com/en-us/sql/relational-databases/errors-events/mssqlserver-18456-database-engine-error

4 – ALERTS IN PORTAL

If you go to the Portal > Monitor > Alerts, you are going to see some alerts fired

You can see that the way I configured, I was alerted during 30 min, every 5 min = 6 alerts. However I just received 1 email because I had the option to suppress the email (Only will send me an alert email after 30 min of the first one)

No other failed connection error happened after that