Microsoft Sentinel Blog

19 MIN READ

Investigating Suspicious Azure Activity with Microsoft Sentinel

TomMcElroy

Microsoft

Nov 22, 2021

Tom McElroy (Microsoft Threat Intelligence Center) & Andrey Karpovsky (Microsoft Defender for Cloud)

With thanks to: Ram Pliskin (Microsoft Defender for Cloud) & Ilay Grossman (Microsoft Defender for Cloud)

This introductory blog post is the first in a series taking a closer look at how to explore potentially suspicious operations within the Azure environment. To begin this series, this blog post will explore how to enable and parse data stored within the Azure Activity log. After enabling the connector, Azure Activity logs will be used to explore a given users' interaction with Azure following suspected account compromise.

The Azure Activity log provides insight into operations that have been performed within an Azure subscription. Any time a user interacts with the Azure management API directly, through the Azure Portal or through the Azure CLI, an entry is made. From a security perspective, the Azure Activity log allows analysts to understand what actions a user may have performed within the subscription.

This blog will also cover the first deep dive into an Azure operation, Azure Run Command. Several hunting approaches to generically detect suspicious Run Command usage will be provided and Microsoft Sentinel queries to connect Azure Activity logging with Microsoft Defender for Endpoint. Connecting data from additional log sources is often required to understand how a potential attacker has used the Azure operation in their campaign.

To learn more about techniques attackers may use to gain access to an Azure Subscription, read the recent DART blog covering cloud trust chains.

Enabling and Parsing Azure Activity Logs

To enable access to this data in Microsoft Sentinel the Azure Activity data connector should be enabled, instructions on how to enable the connector can be found here. The connector recently moved to using the diagnostic settings pipeline which provides better latency and access to several additional columns, to quickly check to see if the new connector is enabled the following query can be executed:

AzureActivity
| where TimeGenerated > ago(30d)
| project Claims, SubscriptionId, _SubscriptionId
| extend NewConnector = iff(isempty(Claims), 0, 1)
| extend _SubscriptionId = iff(isempty(_SubscriptionId), SubscriptionId, _SubscriptionId)
| summarize max(NewConnector), count() by _SubscriptionId
| project _SubscriptionId, NewConnector=max_NewConnector, Events=count_

Example output is shown below; the first two subscriptions are using the new connector, whilst the final subscription is not and should be updated.

Legacy connections should be updated by following these steps which, after disabling the legacy connect will apply a policy to update diagnostic settings. The diagnostic setting for Azure Activity logs can also be applied manually without policy.

First search for the Activity log service in the Azure Portal search bar:

Step 1: Open Activity Log

Next, click the “Diagnostic settings” icon:

Step 2: Click Diagnostic settings

Once loaded, select the correct subscription, and then click “Add diagnostic setting”:

Step 3: Add a new diagnostic setting

Finally, select all required logs and which log analytics workspace they should be sent to:

Step 4: Select required data and configure

Azure Activity logs contain a wealth of information when analysing potential suspicious activity in the cloud environment. Azure Activity logs contain information from a range of Azure services, with each providing different levels of insight. To begin analysing data within Azure Activity it is important to determine which service has produced the log entry, this can be done using the OperationNameValue field.

The OperationNameValue field provides a path structure which defines which service the operation was executed against, and what operation was used. Taking the below as an example:

Microsoft.RecoveryServices/vaults/delete

This indicates that a Microsoft Recovery Services vault was deleted. With this information, a query can be constructed to retrieve logs related to all vault deletion event in the last 3 days with the query.

AzureActivity
| where TimeGenerated > ago(3d)
| where OperationNameValue =~ "Microsoft.RecoveryServices/vaults/delete"

Analysing User Interactions

One way of using Azure activity data is to understand a user’s interaction with the service. Below is an example of using Azure Activity to understand what a user did during a time that suspicious account activity may have taken place.

In this example the objective is to understand if a suspected user account compromise led to actions being performed that are of concern. Whilst by no means an exhaustive list, here are some examples of potentially suspicious operations and how they can be used maliciously:

Microsoft.Storage/storageAccounts/listKeys/action: Generated when viewing a storage account's private keys, seeing this operation being performed during a period of compromise may indicate the actor has copied private keys.
Microsoft.Web/sites/functions/listsecrets/action: Reading secrets from an Azure hosted website.
Microsoft.Compute/virtualMachines/delete: Removal of a virtual machine, when seen in sufficient quantity this may indicate intentional malicious damage.
Microsoft.RecoveryServices/vaults/backupPolicies/write: Manipulation of backup policies by a user may indicate activity such as retention period tampering, a technique to reduce data retention or backup policies to weaken and organisation's resilience to destructive attacks.

The following summarisation query can be used to provide a quick overview of that account's interaction with Azure and the operations they have performed.

AzureActivity
| where TimeGenerated between(datetime(11/22/2021, 0:0:0.0 AM) .. datetime(11/23/2021, 0:0:0.0 AM))
| where Caller =~ "user@contoso.com"
| extend OperationNameValue = tolower(OperationNameValue)
| summarize Operations=count(), IPs=dcount(CallerIpAddress), FirstExecution=min(TimeGenerated), LastExecution=max(TimeGenerated), IPUsed=make_set(CallerIpAddress), max(Category) by OperationNameValue
| extend DaysDelta = datetime_diff('day', LastExecution, FirstExecution) | extend DaysDelta = iff(DaysDelta == 0, 1, DaysDelta)

Execution of the above query will produce results like the following.

In this example, the user mainly performs actions related to security insights and interacts with incidents. This user is likely a member of the SOC, incident response or blue team. At the bottom of the table is an Azure Run Command action; based on the user’s normal profile of activity this is unusual and warrants further investigation. This query also provides the IP addresses that have been used, while masked in the above image the run command operation originated from an IP address not seen associated with the user before.

The next step is to investigate Run Command in more detail, which will be covered in a later section of the blog post. While Azure Activity shows which operation has been executed, gaining further insight is only possible using additional logging.

Proactively Hunting for Suspicious Operations

Using Azure Activity logs to investigate what actions a potentially compromised account may have executed is useful when responding to a potential compromise, however this requires a signal to indicate an account compromise may have taken place.

As part of research conducted by MSTIC and MDC into Azure Activity log, the following query has been developed to highlight potentially suspicious interactions with Azure services. This hunting query can be used to explore anomalous actions surrounding any Azure operation or resource type, allowing threat hunters to identify attacks utilising operations not already well documented.

This query can be used to explore several different anomalies focussed on operations that may be abused by threat actors. Upon execution the query will produce a Boolean result for each of these potential anomalies:

New Caller: Indicates that the caller has not been seen before within the Azure Activity training data, regardless of operation or resource type.
New Caller on the Monitored Resource: Indicates that the caller has not been seen before performing operations on the given resource.
New IP Mask: The IP address subnet is new for the subscription.
New IP Mask on the Monitored Resource: The IP address subnet is new for this resource type but may not be new for the subscription.
New Monitored Operation on Subscription: Indicates that the monitored operation was not performed before on the subscription.
Anomalous Caller on Monitored Resource: Indicates that the operation was performed on monitored resource type by user who usually doesn't work on monitored resource type. This result looks for low Jaccard Index or Lift values calculated over training period. Jaccard and lift computation is performed by a functions develops as part of this research called pair_probabilities_fl(). More information can be found here.

By default, the hunting query is configured to detect suspicious Run Command operations and Operations that target Azure VM resources. This can be changed by updating the “monitoredOps” and “monitoredResource” parameters at the top of the query. A fully commented version of the query can be found below, providing information about each step of the query. A version with only configurable parameter comments is available on the Microsoft Sentinel GitHub here.

Spoiler

  // When the detection window will start (3 days prior to now)
  let startDetectDate = 3d;
  // When the detection window will end(now)
  let endDetectDate = 0d;
  // When to start collecting data for detection
  let startDate = startDetectDate + 30d;
  // Operation to monitor, in this case Run Command
  let monitoredOps = dynamic(['microsoft.compute/virtualmachines/runcommand/action']);
  // The resource type to monitor, in this case virtual machines
  let monitoredResource = pack_array('microsoft.compute/virtualmachines');
  // This function describes relationships between two values by calculating probabilities and related metrics.
  // For example, taking a user and resource type in Azure Activity logs. Intuitively, if probability (or related metric) of seeing specific user 
  // working on a specific resource type is low, yet they are doing it, this is anomalous.
  // One useful metric is Jaccard Similarity Index, defined as ratio between intersection and union probabilities. 
  // It is scaled between [0, 1]. When close to 0 - this means that the two values tend to stay apart. When close to 1 - they tend to be seen together.
  // An additional useful metric is Lift, defined as ratio between conditional probability and regular probability. 
  // When the Lift of two values is close to 1, it means they appear together as though they are independent. When it is much larger than 1, 
  // it means they appear together more than expected when assuming independence. When it is close to 0, it means the values appear much less 
  // than expected. Thus, when Jaccard index or Lift of specific {user, resourceType} pair are below threshold, it indicates anomalous 
  // (and thus potentially suspicious) behavior.
  let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
  {
  let T = (tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
  let countOnScope = T | summarize countAllOnScope = count() by _scope;
  let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
  let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
  let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
      probAB
      // probability for each value of A
      | join kind = leftouter (probA) on _A, _scope
      // probability for each value of B
      | join kind = leftouter (probB) on _B, _scope
      // union probability
      | extend P_AUB = P_A + P_B - P_AB
             // conditional probability of A on B
             , P_AIB = P_AB/P_B
             // conditional probability of B on A
             , P_BIA = P_AB/P_A
      // lift metric
      | extend Lift_AB = P_AB/(P_A * P_B)
             // Jaccard similarity index
             , Jaccard_AB = P_AB/P_AUB
      | project _A, _B, _scope, floor(P_A, 0.00001), floor(P_B, 0.00001), floor(P_AB, 0.00001), floor(P_AUB, 0.00001), floor(P_AIB, 0.00001)
      , floor(P_BIA, 0.00001), floor(Lift_AB, 0.00001), floor(Jaccard_AB, 0.00001)
      | sort by _scope, _A, _B
  };
  // Prepare Azure Activity data for processing 
  let eventsTable = materialize (
  AzureActivity
  // Time window to collect activity
  | where TimeGenerated between (ago(startDate) .. ago(endDetectDate))
  | where isnotempty(CallerIpAddress)
  // Only collect instances where the request Succeeded
  | where ActivityStatusValue has_any ('Success', 'Succeeded')
  // Process subscription ID and resource ID to support new and old connector schema
  | extend SubscriptionId = iff(isempty(_SubscriptionId), SubscriptionId, _SubscriptionId)
  | extend ResourceId = iff(isempty(_ResourceId), ResourceId, _ResourceId)
  // Process operations name and resource ID
  | extend splitOp = split(OperationNameValue, '/')
  | extend splitRes = split(ResourceId, '/')
  // Limit to actions with a caller IP
  | project TimeGenerated , subscriptionId=SubscriptionId
              , ResourceProvider
              , ResourceName = tolower(tostring(splitRes[-1]))
              , OperationNameValue = tolower(OperationNameValue)
              , timeSlice = floor(TimeGenerated, 1d)
              , clientIp = tostring(CallerIpAddress)
              , Caller
              , isMonitoredOp = iff(OperationNameValue has_any (monitoredOps), 1, 0)
              , isMonitoredResource = iff(OperationNameValue has_any (monitoredResource), 1, 0)
              , CorrelationId
  | extend clientIpMask = format_ipv4_mask(clientIp, 16)
  );
  // Generate model data by aggregating over training window by subscription, caller and IP. We calculate the dates 
  // that the entity was first seen on subscription ('firstSeen'), performing monitored operation ('firstSeenOnMonOp') 
  // and working on the monitored resource ('firstSeenOnMonRes').
  let modelData =  (
  eventsTable
  | where TimeGenerated < ago(startDetectDate) and isnotempty(Caller) and isnotempty(subscriptionId)
  | summarize countEvents = count(), countMonRes = countif(isMonitoredResource == 1), counMonOp = countif(isMonitoredOp == 1)
      , firstSeen = min(timeSlice), firstSeenOnMonRes = minif(timeSlice, isMonitoredResource == 1), firstSeenOnMonOp = minif(timeSlice, isMonitoredOp == 1)
      by subscriptionId, Caller, clientIpMask
  );
  // Invoke pair_probabilities_fl (described above) over the processed Azure Activity data for {Caller, isMonitoredResource} pairs for each 
  // subscription. Caller identifies the user, and isMonitoredResource is a flag for monitored resource type. We calculate the metrics for 
  // each user to do any work on relevant resource type at training window. We filter the results by relevant resource 
  // (isMonitoredResource == 1) for later usage.
  let monOpProbs = materialize (
  eventsTable
  | where TimeGenerated < ago(startDetectDate) and isnotempty(Caller) and isnotempty(subscriptionId)
  | invoke pair_probabilities_fl('Caller', 'isMonitoredResource','subscriptionId')
  | where _B == 1
  | sort by P_AIB desc
  | extend rankOnMonRes = row_rank(P_AIB), sumBiggerCondProbs = row_cumsum(P_AIB) - P_AIB
  | extend avgBiggerCondProbs = floor(iff(rankOnMonRes > 1, sumBiggerCondProbs/(rankOnMonRes-1), max_of(0.0, prev(sumBiggerCondProbs))), 0.00001)
  | project-away sumBiggerCondProbs
  );
  // Now join the original data, the model data and pair_probabilities_fl data
  eventsTable
  | where TimeGenerated between (ago(startDetectDate) .. ago(endDetectDate))
  // Join with modelData on subscriptionId and Caller, to get firstSeen dates (general, monitored resource and monitored operation) for this Caller on subscription
  | join kind = leftouter (modelData | summarize countEventsPrincOnSub = sum(countEvents), countEventsMonResPrincOnSub = sum(countMonRes),  countEventsMonOpPrincOnSub = sum(counMonOp)
      , firstSeenPrincOnSubs = min(firstSeen), firstSeenMonResPrincOnSubs = min(firstSeenOnMonRes), firstSeenMonOpPrincOnSubs = min(firstSeenOnMonOp) by subscriptionId, Caller) 
          on subscriptionId, Caller
  // Join with modelData on subscriptionId and IpMask, to get firstSeen dates (general, monitored resource and monitored operation) for this IpMask on subscription
  | join kind = leftouter (modelData | summarize countEventsIpMaskOnSub = sum(countEvents), countEventsMonResIpMaskOnSub = sum(countMonRes),  countEventsMonOpIpMaskOnSub = sum(counMonOp)
      , firstSeenIpMaskOnSubs = min(firstSeen), firstSeenMonResIpMaskOnSubs = min(firstSeenOnMonRes), firstSeenMonOpIpMaskOnSubs = min(firstSeenOnMonOp) by subscriptionId, clientIpMask) 
          on subscriptionId, clientIpMask
  // Join with modelData on subscriptionId, to get firstSeen dates (general, monitored resource and monitored operation) for this subscription in general (all users)
  | join kind = leftouter (modelData | summarize countEventsOnSub = sum(countEvents), countEventsMonResOnSub = sum(countMonRes),  countEventsMonOpOnSub = sum(counMonOp)
      , firstSeenOnSubs = min(firstSeen), firstSeenMonResOnSubs = min(firstSeenOnMonRes), firstSeenMonOpOnSubs = min(firstSeenOnMonOp)
      , countCallersOnSubs = dcount(Caller), countIpMasksOnSubs = dcount(clientIpMask) by subscriptionId)
          on subscriptionId        
  | project-away subscriptionId1, Caller1, subscriptionId2
  // Calculate the number of days the user has been active in the subscription
  | extend daysOnSubs = datetime_diff('day', timeSlice, firstSeenOnSubs)
  // Calculate the average number of monitored operations performed on the subscription, representing the expected baseline of operations
  | extend avgMonOpOnSubs = floor(1.0*countEventsMonOpOnSub/daysOnSubs, 0.01), avgMonResOnSubs = floor(1.0*countEventsMonResOnSub/daysOnSubs, 0.01)
  // Join with monOpProbs on subscriptionId and Caller to get probabilities for ths user to work on monitored resourceType
  | join kind = leftouter(monOpProbs) on $left.subscriptionId == $right._scope, $left.Caller == $right._A
  | project-away _A, _B, _scope
  | sort by subscriptionId asc, TimeGenerated asc
  | extend rnOnSubs = row_number(1, subscriptionId != prev(subscriptionId))
  | sort by subscriptionId asc, Caller asc, TimeGenerated asc
  | extend rnOnCallerSubs = row_number(1, (subscriptionId != prev(subscriptionId) and (Caller != prev(Caller))))
  //// // Anomaly scenarios
  // Indicates that Caller wasn't seen on subscription during training, since joining with model data on that caller brought no results.
  // Seeing a previously unseen Caller performing a high-riks operation is suspicious.
  | extend newCaller = iff(isempty(firstSeenPrincOnSubs), 1, 0)
  // Indicates that Caller didn't perform operations on monitored resource type during training.
  // Seeing a Caller who previously didn't work on this resource performing a high-risk operation is suspicious.
      , newCallerOnMonRes = iff(isempty(firstSeenMonResPrincOnSubs), 1, 0)
  // Indicates that IpMask wasn't seen on subscription during training.
  // Seeing a previously unseen IpMask performing a high-riks operation is suspicious.
      , newIpMask = iff(isempty(firstSeenIpMaskOnSubs), 1, 0)
  // Indicates that IpMask didn't perform operations on monitored resource type during training.
  // Seeing an IpMask who previously didn't work on this resource performing a high-risk operation is suspicious. 
      , newIpMaskOnMonRes = iff(isempty(firstSeenMonResIpMaskOnSubs), 1, 0)
  // Indicates that monitored operation wasn't performed at all on subscription during training.
  // Seeing a high-risk operation performed for the first time on subscription is suspicious.
      , newMonOpOnSubs = iff(isempty(firstSeenMonResOnSubs), 1, 0)
  // Indicates that operation was performed on monitored resource type by user who usually doesn't work on monitored resource type.
  // This is suspicious, since it might indicate that this user that doesn't have a legitimate reason to do any operations
  // on this resource type, in particular a high-value operation as the monitored one.
  // We look for low Jaccard Index or Lift values calculated over training period, which indicate that any activity on monitored resource type is anomalous for this user.
  // Thus, actual execution of high-risk operation on monitored resource in detection window is unexpected and thus suspicious.
  // By setting a lower value for anomalyProbThreshold, we can look for more significant anomalies.
      , anomCallerMonRes = iff(((Jaccard_AB <= 0.1) or (P_AIB <= 0.1)), 1, 0)
  | project TimeGenerated, subscriptionId,  ResourceProvider, ResourceName, OperationNameValue, Caller, CorrelationId, ClientIP=clientIp, ActiveDaysOnSub=daysOnSubs, avgMonOpOnSubs, newCaller, newCallerOnMonRes, newIpMask, newIpMaskOnMonRes, newMonOpOnSubs, anomCallerMonRes, isMonitoredOp, isMonitoredResource
  | order by TimeGenerated
  | where isMonitoredOp == 1
  // Optional - focus only on monitored operations or monitored resource in detection window
  | where isMonitoredOp == 1
  //| where isMonitoredResource == 1

// When the detection window will start (3 days prior to now) let startDetectDate = 3d; // When the detection window will end(now) let endDetectDate = 0d; // When to start collecting data for detection let startDate = startDetectDate + 30d; // Operation to monitor, in this case Run Command let monitoredOps = dynamic(['microsoft.compute/virtualmachines/runcommand/action']); // The resource type to monitor, in this case virtual machines let monitoredResource = pack_array('microsoft.compute/virtualmachines'); // This function describes relationships between two values by calculating probabilities and related metrics. // For example, taking a user and resource type in Azure Activity logs. Intuitively, if probability (or related metric) of seeing specific user // working on a specific resource type is low, yet they are doing it, this is anomalous. // One useful metric is Jaccard Similarity Index, defined as ratio between intersection and union probabilities. // It is scaled between [0, 1]. When close to 0 - this means that the two values tend to stay apart. When close to 1 - they tend to be seen together. // An additional useful metric is Lift, defined as ratio between conditional probability and regular probability. // When the Lift of two values is close to 1, it means they appear together as though they are independent. When it is much larger than 1, // it means they appear together more than expected when assuming independence. When it is close to 0, it means the values appear much less // than expected. Thus, when Jaccard index or Lift of specific {user, resourceType} pair are below threshold, it indicates anomalous // (and thus potentially suspicious) behavior. let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string) { let T = (tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, '')); let countOnScope = T | summarize countAllOnScope = count() by _scope; let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope; let probA = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope; let probB = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope; probAB // probability for each value of A | join kind = leftouter (probA) on _A, _scope // probability for each value of B | join kind = leftouter (probB) on _B, _scope // union probability | extend P_AUB = P_A + P_B - P_AB // conditional probability of A on B , P_AIB = P_AB/P_B // conditional probability of B on A , P_BIA = P_AB/P_A // lift metric | extend Lift_AB = P_AB/(P_A * P_B) // Jaccard similarity index , Jaccard_AB = P_AB/P_AUB | project _A, _B, _scope, floor(P_A, 0.00001), floor(P_B, 0.00001), floor(P_AB, 0.00001), floor(P_AUB, 0.00001), floor(P_AIB, 0.00001) , floor(P_BIA, 0.00001), floor(Lift_AB, 0.00001), floor(Jaccard_AB, 0.00001) | sort by _scope, _A, _B }; // Prepare Azure Activity data for processing let eventsTable = materialize ( AzureActivity // Time window to collect activity | where TimeGenerated between (ago(startDate) .. ago(endDetectDate)) | where isnotempty(CallerIpAddress) // Only collect instances where the request Succeeded | where ActivityStatusValue has_any ('Success', 'Succeeded') // Process subscription ID and resource ID to support new and old connector schema | extend SubscriptionId = iff(isempty(_SubscriptionId), SubscriptionId, _SubscriptionId) | extend ResourceId = iff(isempty(_ResourceId), ResourceId, _ResourceId) // Process operations name and resource ID | extend splitOp = split(OperationNameValue, '/') | extend splitRes = split(ResourceId, '/') // Limit to actions with a caller IP | project TimeGenerated , subscriptionId=SubscriptionId , ResourceProvider , ResourceName = tolower(tostring(splitRes[-1])) , OperationNameValue = tolower(OperationNameValue) , timeSlice = floor(TimeGenerated, 1d) , clientIp = tostring(CallerIpAddress) , Caller , isMonitoredOp = iff(OperationNameValue has_any (monitoredOps), 1, 0) , isMonitoredResource = iff(OperationNameValue has_any (monitoredResource), 1, 0) , CorrelationId | extend clientIpMask = format_ipv4_mask(clientIp, 16) ); // Generate model data by aggregating over training window by subscription, caller and IP. We calculate the dates // that the entity was first seen on subscription ('firstSeen'), performing monitored operation ('firstSeenOnMonOp') // and working on the monitored resource ('firstSeenOnMonRes'). let modelData = ( eventsTable | where TimeGenerated < ago(startDetectDate) and isnotempty(Caller) and isnotempty(subscriptionId) | summarize countEvents = count(), countMonRes = countif(isMonitoredResource == 1), counMonOp = countif(isMonitoredOp == 1) , firstSeen = min(timeSlice), firstSeenOnMonRes = minif(timeSlice, isMonitoredResource == 1), firstSeenOnMonOp = minif(timeSlice, isMonitoredOp == 1) by subscriptionId, Caller, clientIpMask ); // Invoke pair_probabilities_fl (described above) over the processed Azure Activity data for {Caller, isMonitoredResource} pairs for each // subscription. Caller identifies the user, and isMonitoredResource is a flag for monitored resource type. We calculate the metrics for // each user to do any work on relevant resource type at training window. We filter the results by relevant resource // (isMonitoredResource == 1) for later usage. let monOpProbs = materialize ( eventsTable | where TimeGenerated < ago(startDetectDate) and isnotempty(Caller) and isnotempty(subscriptionId) | invoke pair_probabilities_fl('Caller', 'isMonitoredResource','subscriptionId') | where _B == 1 | sort by P_AIB desc | extend rankOnMonRes = row_rank(P_AIB), sumBiggerCondProbs = row_cumsum(P_AIB) - P_AIB | extend avgBiggerCondProbs = floor(iff(rankOnMonRes > 1, sumBiggerCondProbs/(rankOnMonRes-1), max_of(0.0, prev(sumBiggerCondProbs))), 0.00001) | project-away sumBiggerCondProbs ); // Now join the original data, the model data and pair_probabilities_fl data eventsTable | where TimeGenerated between (ago(startDetectDate) .. ago(endDetectDate)) // Join with modelData on subscriptionId and Caller, to get firstSeen dates (general, monitored resource and monitored operation) for this Caller on subscription | join kind = leftouter (modelData | summarize countEventsPrincOnSub = sum(countEvents), countEventsMonResPrincOnSub = sum(countMonRes), countEventsMonOpPrincOnSub = sum(counMonOp) , firstSeenPrincOnSubs = min(firstSeen), firstSeenMonResPrincOnSubs = min(firstSeenOnMonRes), firstSeenMonOpPrincOnSubs = min(firstSeenOnMonOp) by subscriptionId, Caller) on subscriptionId, Caller // Join with modelData on subscriptionId and IpMask, to get firstSeen dates (general, monitored resource and monitored operation) for this IpMask on subscription | join kind = leftouter (modelData | summarize countEventsIpMaskOnSub = sum(countEvents), countEventsMonResIpMaskOnSub = sum(countMonRes), countEventsMonOpIpMaskOnSub = sum(counMonOp) , firstSeenIpMaskOnSubs = min(firstSeen), firstSeenMonResIpMaskOnSubs = min(firstSeenOnMonRes), firstSeenMonOpIpMaskOnSubs = min(firstSeenOnMonOp) by subscriptionId, clientIpMask) on subscriptionId, clientIpMask // Join with modelData on subscriptionId, to get firstSeen dates (general, monitored resource and monitored operation) for this subscription in general (all users) | join kind = leftouter (modelData | summarize countEventsOnSub = sum(countEvents), countEventsMonResOnSub = sum(countMonRes), countEventsMonOpOnSub = sum(counMonOp) , firstSeenOnSubs = min(firstSeen), firstSeenMonResOnSubs = min(firstSeenOnMonRes), firstSeenMonOpOnSubs = min(firstSeenOnMonOp) , countCallersOnSubs = dcount(Caller), countIpMasksOnSubs = dcount(clientIpMask) by subscriptionId) on subscriptionId | project-away subscriptionId1, Caller1, subscriptionId2 // Calculate the number of days the user has been active in the subscription | extend daysOnSubs = datetime_diff('day', timeSlice, firstSeenOnSubs) // Calculate the average number of monitored operations performed on the subscription, representing the expected baseline of operations | extend avgMonOpOnSubs = floor(1.0*countEventsMonOpOnSub/daysOnSubs, 0.01), avgMonResOnSubs = floor(1.0*countEventsMonResOnSub/daysOnSubs, 0.01) // Join with monOpProbs on subscriptionId and Caller to get probabilities for ths user to work on monitored resourceType | join kind = leftouter(monOpProbs) on $left.subscriptionId == $right._scope, $left.Caller == $right._A | project-away _A, _B, _scope | sort by subscriptionId asc, TimeGenerated asc | extend rnOnSubs = row_number(1, subscriptionId != prev(subscriptionId)) | sort by subscriptionId asc, Caller asc, TimeGenerated asc | extend rnOnCallerSubs = row_number(1, (subscriptionId != prev(subscriptionId) and (Caller != prev(Caller)))) //// // Anomaly scenarios // Indicates that Caller wasn't seen on subscription during training, since joining with model data on that caller brought no results. // Seeing a previously unseen Caller performing a high-riks operation is suspicious. | extend newCaller = iff(isempty(firstSeenPrincOnSubs), 1, 0) // Indicates that Caller didn't perform operations on monitored resource type during training. // Seeing a Caller who previously didn't work on this resource performing a high-risk operation is suspicious. , newCallerOnMonRes = iff(isempty(firstSeenMonResPrincOnSubs), 1, 0) // Indicates that IpMask wasn't seen on subscription during training. // Seeing a previously unseen IpMask performing a high-riks operation is suspicious. , newIpMask = iff(isempty(firstSeenIpMaskOnSubs), 1, 0) // Indicates that IpMask didn't perform operations on monitored resource type during training. // Seeing an IpMask who previously didn't work on this resource performing a high-risk operation is suspicious. , newIpMaskOnMonRes = iff(isempty(firstSeenMonResIpMaskOnSubs), 1, 0) // Indicates that monitored operation wasn't performed at all on subscription during training. // Seeing a high-risk operation performed for the first time on subscription is suspicious. , newMonOpOnSubs = iff(isempty(firstSeenMonResOnSubs), 1, 0) // Indicates that operation was performed on monitored resource type by user who usually doesn't work on monitored resource type. // This is suspicious, since it might indicate that this user that doesn't have a legitimate reason to do any operations // on this resource type, in particular a high-value operation as the monitored one. // We look for low Jaccard Index or Lift values calculated over training period, which indicate that any activity on monitored resource type is anomalous for this user. // Thus, actual execution of high-risk operation on monitored resource in detection window is unexpected and thus suspicious. // By setting a lower value for anomalyProbThreshold, we can look for more significant anomalies. , anomCallerMonRes = iff(((Jaccard_AB <= 0.1) or (P_AIB <= 0.1)), 1, 0) | project TimeGenerated, subscriptionId, ResourceProvider, ResourceName, OperationNameValue, Caller, CorrelationId, ClientIP=clientIp, ActiveDaysOnSub=daysOnSubs, avgMonOpOnSubs, newCaller, newCallerOnMonRes, newIpMask, newIpMaskOnMonRes, newMonOpOnSubs, anomCallerMonRes, isMonitoredOp, isMonitoredResource | order by TimeGenerated | where isMonitoredOp == 1 // Optional - focus only on monitored operations or monitored resource in detection window | where isMonitoredOp == 1 //| where isMonitoredResource == 1

Once an anomalous action has been identified, analysis using additional log sources may be required. Using Azure Run Command execution on a Windows VM as an example, it is not possible using only Azure Activity logs to understand what PowerShell commands were executed. Additional log data can be joined with Azure Activity to provide deeper insight.

Run Command Extension Deep Dive

Run Command is a default extension to Windows and Linux virtual machines hosted in Azure. The feature consists of two core components, an Azure fabric controller and an on-host guest agent which runs on the virtual machine. Users can interact with the Azure fabric controller through the Azure Portal, Azure CLI or Azure PowerShell. Run Command allows administrative users to execute PowerShell or Bash scripts on Virtual Machines they manage.

If a privileged user account is compromised, this feature may be used to execute unauthorised commands. An example of this can be seen in a security testing tool developed by Dell SecureWorks called ShockNAwe, this security tool uses a so-called Golden SAML token to invoke the Run Command extension to execute PowerShell on Windows Virtual Machines.

When investigating potential malicious use of this extension using Microsoft Sentinel, there are three major touch points that allow actions to be linked from the Azure fabric controller down to on-host activity. The ability to link events across cloud and on-host allows Sentinel to execute hybrid detections mirroring the threat actor’s hybrid attack.

This section will provide examples of how to detect potentially suspicious Run Command execution using Azure Activity logs. It will also provide examples of how to connect Azure Activity and Microsoft Defender for Endpoint logs to gain deeper insight into the suspicious activity.

Hunting for Anomalous Run Command spikes in Azure Activity

In addition to this detection, other queries to detect unusual usage can be crafted, the first example is a hunting query that surfaces spikes in Run Command execution. Run Command is often used to conduct administrative actions, this often leads to periodic and predictable execution. Multiple Run Command actions executed in a row may indicate hands-on-keyboard activity.

This hunting query can be configured to detect spikes in run command within a given environment. The default configuration for this query will surface any instance of a single user executing more than 5 Run Command operations in a 15-minute window targeting a single resource.

let timeDelta = 15; // Minutes
let resourcesImpacted = 1; // Number of VMs
let commandsExecuted = 5; // Number of commands
AzureActivity
| where TimeGenerated > ago(30d)
| where OperationName =~ "Run Command on Virtual Machine"
| project TimeGenerated, Caller, CallerIpAddress, ActivityStatus, OperationName, CorrelationId, Resource, ResourceGroup
| summarize Start=min(TimeGenerated), End=max(TimeGenerated) , make_list(ActivityStatus), max(CallerIpAddress) by CorrelationId, Caller, Resource, ResourceGroup
| extend Succeeded = iff(list_ActivityStatus has "Succeeded", True, False)
| project Start, End, ResourceGroup, Resource, Caller, CallerIP=max_CallerIpAddress, Succeeded
| extend Resource = tolower(Resource), ResourceGroup = tolower(ResourceGroup)
| extend p = pack(Resource,ResourceGroup)
| summarize ResourceCount=dcount(Resource), ResourceGroupCount=dcount(ResourceGroup), Start=min(Start), End=max(End), ResourceBag=make_bag(p), Commands=count() by Caller, CallerIP
| extend ["WindowDelta [Minutes]"] = datetime_diff("minute", End, Start)
| where ResourceCount >= resourcesImpacted and ["WindowDelta [Minutes]"] < timeDelta and Commands >= commandsExecuted
| project Caller, CallerIP, ResourceCount, ResourceGroupCount, ["WindowDelta [Minutes]"], Start, End, ResourceBag

Hunting for Infrequent Run Command Interaction

Continuing the theme of detecting hands-on-keyboard activity a query can be crafted to detect interactions with Run Command on virtual machines that is only present for a very short period. A threat actor may use Run Command as a pivot into the VM environment. Once a VM is successfully compromised, there is little need to re-execute Run Command actions against that target. This may lead to threat actor use of Run Command being limited to short period of time, in comparison to legitimate administrative activity.

The following Hunting query allows the maximum interaction time to be configured. By default, this hunting query will return results when a user is seen executing Run Command against a VM for only a 24-hour period in 30 days.

let MaxInteractionTime = 24;
AzureActivity
| where TimeGenerated > ago(30d)
| where OperationNameValue =~ "Microsoft.Compute/virtualMachines/runCommand/action"
| where Authorization has "virtualMachines"
| summarize Start=min(TimeGenerated), End=max(TimeGenerated), max(CallerIpAddress), make_list(ActivityStatusValue) by CorrelationId, Authorization, Caller
| where list_ActivityStatusValue has "Succeeded"
| extend Authorization_d = parse_json(Authorization)
| extend Scope = Authorization_d.scope
| extend Scope_s = split(Scope, "/")
| extend Subscription = tostring(Scope_s[2])
| extend VirtualMachineName = tostring(Scope_s[-1])
| project Start, End, Subscription, VirtualMachineName, CorrelationId, Caller, CallerIpAddress=max_CallerIpAddress
| summarize Start=min(Start), End=max(End), dcount(VirtualMachineName), count() by Caller, VirtualMachineName
| extend TimeDelta = datetime_diff("Hour", End, Start)
| where TimeDelta < MaxInteractionTime

Linking Azure Activity with User Entity Behaviour Analytics

A detection which correlates Run Command actions with recent User Entity Behaviour Analytics (UEBA) alerts was released as part of the original Microsoft security blog. UEBA continuously monitors and baselines user activity and then generates an alert when this baseline is breached. For example, if a user is observed logging in from an unusual location, or using an unusual client application, a UEBA alert may be generated. More information on UEBA can be found here.

The query merges Run Command activities with UEBA data, it then checks to see if the Run Command action was executed up to 1 hour before or 6 hours after a UEBA breach. This analytic template is available in Microsoft Sentinel workspaces, the KQL query can be copied and used as the basis for additional hunting queries.

Linking Azure Activity with Endpoint Data using Microsoft Sentinel

While it’s possible to hunt for suspicious Run Command activity in Azure Activity logs, there is no visibility into the command that was executed within current Azure logging. To help address this with existing logging, Azure Activity events can be connected to Microsoft Defender for Endpoint logging to provide deeper insights.

The first step of this query is to parse and summarise the Azure Activity data, the query to do this is provided at the start of this section under “Parsing Azure Activity Logs”. The critical piece of information extracted at this point is the virtual machine name, which will be used to link endpoint and Azure activity. When Run Command is executed it will take the PowerShell provided by the user, pass this through the Azure Run Command service fabric, and ultimately write this to a file on disk. With each Run command execution, a new PowerShell file will be written to disk by the Run Command Extension process.

The following Kusto will extract file write events performed by the Run Command Extension process from the DeviceFileEvents advanced hunting table created by Microsoft Defender for Endpoint. With these events the Kusto query will join with the Azure cloud events based on the Virtual Machine name. The final step is to filter the file write events based on the time Run Command was executed.

| join kind=leftouter (
    DeviceFileEvents
    | where InitiatingProcessFileName == "RunCommandExtension.exe”
    | extend VirtualMachineName = tostring(split(DeviceName, ".")[0])
    | project VirtualMachineName, PowershellFileCreatedTimestamp=TimeGenerated, FileName, FileSize, InitiatingProcessAccountName, InitiatingProcessAccountDomain, InitiatingProcessFolderPath, InitiatingProcessId
) on VirtualMachineName
| where PowershellFileCreatedTimestamp between (StartTime .. EndTime)
| project StartTime, EndTime, PowershellFileCreatedTimestamp, VirtualMachineName, Caller, CallerIpAddress, FileName, FileSize, InitiatingProcessId, InitiatingProcessAccountDomain, InitiatingProcessFolderPath

The below screenshot shows example output from this query. At this point data from Azure logging (The Caller user principal name and their IP address) has been joined with information from MDE logging providing further insights into the command executed. Notably, MDE provides insight into the file size of the executed script. The script name is automatically generated by the Run Command extension when the file is written to disk.

At this point it is possible to craft a hunting query that surfaces windows of activity where the file size is volatile. Volatility in file size may be indicative of hands-on keyboard activity, with an actor crafting and executing individual queries sequentially as part of the reconnaissance phase of an attack.

The below lines of Kusto can be added to the query to surface instances where a single user, executing commands against a virtual machine, uses commands that were different over 80% of the time, accounting for repeated commands that may have failed.

| summarize StartTime=min(StartTime), EndTime=max(EndTime), count(), dcount(FileSize) by VirtualMachineName, CallerIpAddress, Caller
| extend PercentageDifferent =  toreal(dcount_FileSize) / toreal(count_)  * 100~
| where PercentageDifferent > 80 and count_ > 1

This kind of analytic is possible because the Azure logs provide insight into the executing user and their IP address, allowing more reliable grouping of results into windows of activity. MDE logging alone provides no insight into the Azure user account used to invoke these commands.

The below image shows three individual user sessions targeting testrig1, each session using a highly volatile set of commands. Additionally, the threshold for minimum number of commands can be adjusted to eliminate low volumes of commands that may be prone to producing false positives.

Returning to the original hunting query, DeviceFileEvents has allowed the linking between the Azure activity and the script creation on host. The next step is to gain insight into what commands the PowerShell script, and in turn the Run Command operation contained.

The following Kusto will extract information about which PowerShell commands were invoked when the script was executed by the Run Command extension. This can be achieved by joining the existing query with data from the DeviceEvents advanced hunting table. Amongst other things, this table contains a record of the PowerShell commands that were executed on an endpoint.

When Run Command is invoked, the final step for execution is to run PowerShell with the file created by the Run Command Extension. As noted earlier, the files created by the Run Command extension have a predictable naming structure and a predicable PowerShell command line. Below is an example of the Command Line when executing a Run Command script:

The PowerShell filename can be extracted from the command line using a regular expression. Summarising based on the PowerShell file name (in the above image script6.ps1), it is possible to see other PowerShell commands loaded as part of the script files execution. In the below image it’s clear script6.ps1 contains the Windows command ipconfig resulting in ipconfig.exe being loaded.

The below Kusto will prepare the data in the DeviceEvent table and perform the final join. The complete query can be found here for reference.

| join kind=inner(
    DeviceEvents
    | extend VirtualMachineName = tostring(split(DeviceName, ".")[0])
    | where InitiatingProcessCommandLine has "-File"
    | extend PowershellFileName = extract(@"\-File\s(script[0-9]{1,9}\.ps1)", 1, InitiatingProcessCommandLine)
    | extend PSCommand = tostring(parse_json(AdditionalFields).Command)
    | order by TimeGenerated asc 
    | where PSCommand != PowershellFileName 
    | summarize PowershellExecStart=min(TimeGenerated), PowershellExecEnd=max(TimeGenerated), make_list(PSCommand) by PowershellFileName, InitiatingProcessCommandLine
) on $left.FileName == $right.PowershellFileName
| project StartTime, EndTime, PowershellFileCreatedTimestamp, PowershellExecStart, PowershellExecEnd, PowershellFileName, PowershellScriptCommands=list_PSCommand, Caller, CallerIpAddress, InitiatingProcessCommandLine, PowershellFileSize=FileSize, VirtualMachineName
| order by StartTime asc 
| extend ScriptFingerprintHash = hash_sha256(tostring(PowershellScriptCommands))

As can be seen in the below image, the completed query now provides information from Azure and MDE logging. Merging this data allows a blue team to determine who invoked Run Command, which hosts were impacted and the PowerShell commands that were executed as part of the Run Command execution.

This information provides a base hunting query to investigate potentially suspicious Run Command behaviour. The original blog post covering this activity provided an example detection built upon this foundation. The detection will raise an alert when a unique PowerShell script is seen executed using Run Command. This detection will be deployed to Sentinel instances with Azure Activity logs and MDE logging available, or can be viewed here.

Conclusion

This blog post has covered how to enable Azure Activity logging and how to begin analysing the logs for malicious activity. A query was provided to explore operations and resource anomalies in a generic way, providing the basis for threat hunts targeting unauthorised usage of operations. After identifying suspicious Azure operation behaviour within Azure Activity logs, a deeper look was taken into Run Command extension, providing several ways to gain insight into the commands a threat actor may have executed on Windows based virtual machine hosts.

Additional hunting queries surrounding Azure Run Command can be found here:

Azure VM Run Command executed from Azure IP address: https://github.com/Azure/Azure-Sentinel/tree/master/Hunting%20Queries/AzureActivity/AzureRunCommandFromAzureIP.yaml

Azure VM Run Command linked with MDE: https://github.com/Azure/Azure-Sentinel/tree/master/Hunting%20Queries/MultipleDataSources/AzureRunCommandMDELinked.yaml

Dormant Service Principal Update Creds and Logs In: https://github.com/Azure/Azure-Sentinel/tree/master/Hunting%20Queries/MultipleDataSources/DormantServicePrincipalUpdateCredsandLogsIn.yaml

Dormant User Update MFA and Logs In: https://github.com/Azure/Azure-Sentinel/tree/master/Hunting%20Queries/MultipleDataSources/DormantUserUpdateMFAandLogsIn.yaml

Updated Nov 22, 2021

Version 4.0

hunting

microsoft defender for cloud

microsoft sentinel

powershell

security