Over the past few years of performing Azure security research, I have seen many new attack primitives & techniques discovered that an adversary could abuse within Azure & Azure Active Directory (AAD). When explaining a technique to a client, the challenge wasn’t explaining how something could be abused, the challenge was explaining how to detect it. Last year, I released the Azure Threat Research Matrix (ATRM), which highlighted the potential techniques an adversary could abuse within Azure & AzureAD. The immediate thought would be to give clients an idea of what potential abuse scenarios exist when they decide to use a certain resource or feature. However, it heavily lacked defensive content. I’ve always been a firm believer in that red team exists only to help blue team, so I’m now releasing my newest project: AzDetectSuite.
AzDetectSuite is a project created to allow Azure users to establish a basic defense within Azure by giving pre-built KQL queries for each technique within ATRM that are deployable Alerts to Azure Monitor. Now, in ATRM, most (85%+) techniques will have a KQL query and a button that will deploy the query to their Azure subscription.
The queries live within a publicly available GitHub repository and can openly be reviewed, Pull Requested, and critiqued. These queries are not a “one-size-fits-all” and are mostly geared towards smaller environments since they are alerting off of more basic telemetry, so use at your own discretion. Within the repository is also a PowerShell script, Invoke-AzDetectSuite.ps1, which will import an entire tactic’s detections for every technique within it, or it can also just import all available detections.
AzDetectSuite vs Microsoft Defender for Cloud
AzDetectSuite (ADS) is not meant to compete with Microsoft Defender for Cloud (MDC). MDC provides advanced detections based on your subscription plan and will give more granular control based on the telemetry in a tenant. ADS is meant to be an open source suite of basic detections for techniques found within ATRM, as MDC is not comprehensive in its coverage for techniques found in ATRM. MDC’s capabilities far exceed ADS, as it is a subscription-based service with more insight into a resource’s telemetry than what is provided to users. In comparison, ADS is open source and is more targeted towards smaller environments that want to ensure their resources are secure from potential abuse. In addition, ADS has some additional detections that utilize agents as well. For example, ADS has a detection that when combined with PowerShell scriptblock logging, will tell you what command was run when someone utilizes RunCommand on an AzureVM. For larger environments, it is recommended to go through ADS and determine which detections will be suitable for your environment and that may compliment MDC.
What Goes into a Detection?
Several years ago, detection and detection engineering in Azure was challenging (to say the least). Today, Azure is much more competent in what it gathers and how it’s delivered. Detection engineering is different in Azure in comparison to an on-premise Windows environment.
In Azure, logs are centralized to Azure Monitor. Azure Monitor will ingest data from hundreds of log sources. These sources range from the general Azure Log (AzureActivity) to more detailed logs, such as Service Principal Sign-Ins (AADServicePrincipalSignInLogs). Writing a basic detection for Azure is very easy, so it is necessary to ask a few questions before developing a detection:
1. How broad should this detection be?
- General alert on a single action
- Specific alert when an action meets a certain condition
2. What are you trying to alert on?
- An action in a Resource?
- Whenever a user or service principal logs in?
- Whenever a new resource is created?
3. Does the resource action ever occur legitimately?
- Part of sysadmin’s routines
- Can you minimize false positives through more granular data?
4. What steps should be taken once the alert fires?
- Enable a runbook?
- Email/Text appropriate parties
Using Kusto Query Language (KQL), a basic detection for something such as RunCommand on a Virtual Machine looks like this:
AzureActivity | where OperationNameValue == 'MICROSOFT.COMPUTE/VIRTUALMACHINES/RUNCOMMAND/ACTION'
Where ‘AzureActivity’ is the log provider and the logs are then filtered to look for when the OperationNameValue property matches ‘MICROSOFT.COMPUTE/VIRTUALMACHINES/RUNCOMMAND/ACTION’.
Specifically, for RunCommand, it will return data such as the originating IP address, caller, and timestamp. It does not, however, include data such as the command that was run, or if it was even successful. To gather that data, we need additional telemetry from the virtual machine itself.
It’s heavily recommended to use the Azure Monitor Agent on all VMs hosted in Azure, as it will forward operating system logs to Azure Monitor. This allows for greater telemetry such as Event Log/syslog data, process information, disk usage, memory allocations, etc. When combined with ScriptBlock logging, the actual command that was run with RunCommand can then be found.
let timeframe = toscalar(AzureActivity | where OperationNameValue == 'MICROSOFT.COMPUTE/VIRTUALMACHINES/RUNCOMMAND/ACTION' and ActivityStatusValue == 'Success' | distinct TimeGenerated); Event | where EventID == '4104' and RenderedDescription has 'RunCommandWindows' | where (timeframe - TimeGenerated) <= 1m
This query will gather the time RunCommand was executed, then compare it to any PowerShell event that has ‘RunCommand’ within the last minute of the timestamp to return the command that was executed.
Adding More Telemetry
Azure Diagnostics is a setting on majority of Azure resources that will capture more telemetry that occurs within that resource. For example, with Automation Accounts, Azure Diagnostics will capture the output of a command. Take for example AZT602.1 - Steal Service Principal Certificate: Automation Account RunAs Account. In order to detect that this technique was abused, AzureDiagnostics can be queried for when the certificate was exported.
This then can be cross-referenced with the AADServicePrincipalSignInLogs provider to determine if that certificate was then used to login as the service principal.
let SPN = toscalar(AzureDiagnostics | where TimeGenerated > ago(1h) | where ResourceProvider == 'MICROSOFT.AUTOMATION' and ResultDescription has "Thumbprint" | distinct Resource); AADServicePrincipalSignInLogs | where TimeGenerated > ago(1h) | where ServicePrincipalName has SPN | project TimeGenerated, ServicePrincipalName
It is heavily recommended to enable AzureDiagnostics for all resources.
Building Anomalous Detections
With some provided telemetry, sometimes it’s difficult to determine if an action is malicious or just an engineer doing their job. For example, building an alert on just whenever a key vault item is retrieved could be a poor detection because it’s typical for an engineer to retrieve that key vault item every day in that environment. However, if that engineer then retrieves 50 other key vault items, that is something anomalous and something that should be investigated. Building a detection based on anomalous behavior can be straightforward for actions that occur rarely, but building an anomalous detection for something that occurs often can be challenging. By monitoring activity via KQL, a baseline can be established and then that baseline can be later referenced in the same query to return any outliers, suggesting anomalous activity.
Take an example of an engineer who typically retrieves a key vault secret 5 times a day.
let EventCountThreshold = 5;
let OperationList = dynamic(["SecretGet", "KeyGet", "VaultGet"]);
let KeyVaultGetEvents48 = AzureDiagnostics
| where TimeGenerated > ago(48h)
| where TimeGenerated < ago(24h)
| where ResourceProvider == "MICROSOFT.KEYVAULT"
| where OperationName in (OperationList)
| summarize Count48 = count() by OperationName, identity_claim_appid_g
| where Count48 > EventCountThreshold;
First, this query sets a threshold of 5, meaning only look at the following KeyVault operations that have occurred more than 5 times in the last 48 hours. Then, it establishes a variable called ‘Count48’, which will retrieve all-time times a specific user gathered a KeyVault item. For the example engineer, that count is 5.
This query represents establishing a baseline of activity from 48 hours ago, to 24 hours ago. In other words, it represents the activity from two days ago up to one day ago. Now, this baseline data can then be referenced for activity over the past 24 hours and compared. Since the engineer typically will retrieve 5 items a day, the expected count for the past 24 hours should also be ‘5’. If the count for the past 24 hours is not 5 and if the percentage difference is greater than a 20% difference, the query will return the ID of the engineer and the percentage difference.
let KeyVaultGetEvents24 = AzureDiagnostics
| where TimeGenerated > ago(24h)
| where ResourceProvider == "MICROSOFT.KEYVAULT"
| where OperationName in (OperationList)
| where not(identity_claim_appid_g in (AllowedAppId))
| summarize Count24 = count() by OperationName, identity_claim_appid_g
| where Count24 > EventCountThreshold;
KeyVaultGetEvents48
| join kind=inner KeyVaultGetEvents24 on identity_claim_appid_g
| extend PercentageDifference = round(((todouble(Count24) - todouble(Count48)) / todouble(Count48)) * 100, 2)
| where PercentageDifference > 19
| project OperationName, identity_claim_appid_g, Count48, Count24, PercentageDifference
Baselining within KQL itself is difficult in larger environments, so by using a percentage difference instead of a hardcoded number is recommended to avoid false positives.
Closing Thoughts
A detection’s complexity varies depending on the granularity of the actual detection, but a solid detection does not have to be complex. The detection should fit the environment that is being monitored instead of having a ‘one detection fits all’ method. AzDetectSuite is meant to introduce basic detections are a very low cost, but those queries are free to be modified to an environment and can be stripped to be more basic or built upon to be more complex.