Blog Post

Security, Compliance, and Identity Blog
10 MIN READ

Advanced hunting for Microsoft Purview Data Loss Prevention (DLP) incidents

JonNordstrom's avatar
JonNordstrom
Icon for Microsoft rankMicrosoft
May 23, 2023

1. Introduction 

Microsoft 365 Defender is the Microsoft-recommended experience for investigation and remediation of Microsoft Purview Data Loss Prevention (DLP) incidents. Please refer to the step-by-step guided walkthrough published earlier to learn how to investigate Microsoft Purview Data Loss Prevention alert in Microsoft 365 Defender

 

As part of the investigative capabilities available in Microsoft 365 Defender, advanced hunting provides the ability to query raw compliance and security data signals generated by Microsoft 365 to proactively detect known and potential risks in your organization as well as visualizing the attack chain. Advanced hunting can boost your investigation workflow and help you learn more about the types of alerts you receive across your estate. This blog provides guidance on how to get started and leverage advanced hunting for Microsoft Purview DLP investigations. We are sharing sample queries for high value scenarios to help you get started.  

 

2. Getting Started

Pre-requisite: Get the necessary CloudAppEvents table, that contains Microsoft Purview data, to show up in Advanced Hunting by following these steps to integrate with Microsoft 365.

 

  1. To access Advanced hunting, go to https://security.microsoft.com and click on Advanced hunting on the left navigation pane.

 

  1. In Advanced Hunting, select the time range for the query.

  1. On the left hand side you have all the tables available, the details for each of these tables is available here. In this post, we provide examples on what you can do with the CloudAppEvents table.
  2. Queries in advanced hunting is based on the KUSTO query language. For an introduction read this.
  3. Once you have inserted the query click, Run query.
  4. There are built-in visualizations within the product, to make the data easier to consume. In the below example, we are summarizing the alerts based on sensitive information types and rendering the results as a pie chart. Visualizations like this can help you quickly learn if any of your SITs are overrepresented in the alert flow (see the command | render piechart).

 

 

3. Scenarios/Samples using advanced hunting

 

Get DLP alerts for a certain user (summarized by document, subject and the Sensitive information type matching)

This query will return the DLP alerts for a user and summarize it based on the affected content. This can be used to understand risky activities. Is the user actively attempting to defeat the controls by sending the content in different ways.

Replace the username.

 

 

let user = ('user@dom.com');
CloudAppEvents
| where ActionType has "DLPRuleMatch" or ActionType has "DLPRuleUndo"
| where tolower(user) == tolower(RawEventData.UserId) 
| extend PolicyName_ = tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].PolicyName)
| extend SensitiveInformationTypeName_ = tostring(parse_json(tostring(parse_json(tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].ConditionsMatched)).SensitiveInformation))[0].SensitiveInformationTypeName)
| extend FileName_ = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).FileName)
| extend RuleName_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].RuleName)
| extend Severity_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].Severity)
| extend Attachments_ = tostring(parse_json(tostring(RawEventData.Item)).Attachments)
| extend Recipients = strcat("To:",RawEventData.ExchangeMetaData.To, " CC:",RawEventData.ExchangeMetaData.CC," BCC:",RawEventData.ExchangeMetaData.BCC) 
| extend SharePointMetaData_SiteCollectionUrl_s  = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).SiteCollectionUrl)
| extend Documents = iff(isempty(FileName_), Attachments_, FileName_) 
| join kind = leftouter 
    ( 
    CloudAppEvents 
| where RawEventData.Operation == "SecureLinkUsed" or RawEventData.Operation == "AddedToSecureLink" 
| extend SourceFileName = tostring(RawEventData.SourceFileName) 
| extend SharePointMetaData_SiteCollectionUrl_s = tostring(RawEventData.SiteUrl)
| extend UserId = tostring(RawEventData.UserId) 
| extend TargetUserOrGroupName = tostring(RawEventData.TargetUserOrGroupName) 
            | summarize by SharePointMetaData_SiteCollectionUrl_s,SourceFileName,TargetUserOrGroupName
            | summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName)) by SharePointMetaData_SiteCollectionUrl_s,SourceFileName
            | summarize take_any(TargetUserOrGroupName) by SourceFileName,SharePointMetaData_SiteCollectionUrl_s
|project SourceFileName, SharePointMetaData_SiteCollectionUrl_s, TargetUserOrGroupName
    ) 
    on SharePointMetaData_SiteCollectionUrl_s   
| extend TargetUsers = iff(isnotempty(tostring(TargetUserOrGroupName)), tostring(TargetUserOrGroupName), Recipients)
| extend Subject = tostring(parse_json(tostring(RawEventData.ExchangeMetaData)).Subject)
| extend PolicyName_EndPoint = tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].PolicyName)
| extend PolicyName = iff(isempty(PolicyName_), PolicyName_EndPoint, PolicyName_)   
| extend Workload = tostring(RawEventData.Workload)
| extend Object = tostring(RawEventData.ObjectId)
| join kind=leftouter
(
CloudAppEvents
| where ActionType has "DLPRuleMatch" or ActionType has "DLPRuleUndo"
| where RawEventData.Workload has "Endpoint"
| extend SensitiveInfoTypeName_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.EndpointMetaData)).SensitiveInfoTypeData))[0].SensitiveInfoTypeName)
| extend EndPointFile = tostring(RawEventData.ObjectId)
| project EndPointFile,SensitiveInfoTypeName_
)
on $left.Object == $right.EndPointFile
| extend Documents = iff(isempty(Documents), EndPointFile, Documents) 
| extend SensitiveInformationType = iff(isempty(SensitiveInformationTypeName_), SensitiveInfoTypeName_, SensitiveInformationTypeName_) 
| extend EndpointOperation_ = tostring(parse_json(tostring(RawEventData.EndpointMetaData)).EndpointOperation)
| extend TargetDomain = tostring(parse_json(tostring(RawEventData.EndpointMetaData)).TargetDomain)
| extend Target = iff(isempty(tostring(TargetDomain)), TargetUsers, tostring(TargetDomain))   
| project  Timestamp,ActionType,Workload,PolicyName,SensitiveInformationType,Documents,Target,Subject,EndpointOperation_,Severity_
| summarize arg_max(Timestamp, *) by Workload,PolicyName,SensitiveInformationType,Documents,Target
| project  Timestamp,ActionType,Workload,PolicyName,SensitiveInformationType,Documents,Target,Subject,EndpointOperation_,Severity_
| sort by Timestamp

 

 

 

Identify who has access or has accessed a document in SharePoint or OneDrive for Business  

This query will return information from SharePoint Online and OneDrive for Business based on the time range defined in advanced hunting. It will return external access and file sharing history and you can validate whom certain files have been shared with for SharePoint and OneDrive DLP alerts. 

Replace the user account with the account that triggered the DLP Policy, replace the filename with the document name.

 

 

let document = "filename"; 
let user = ('useraccount'); 
let DLP = ( CloudAppEvents
    | where RawEventData.Workload in ('SharePoint', 'OneDrive')
    | where ActionType == "DLPRuleMatch" 
    | extend FilePathUrl_ = url_decode(tostring(RawEventData.SharePointMetaData.FilePathUrl))
    | extend path = parse_path(FilePathUrl_)
    | where path.Filename has document
    | extend DirectoryPath = tostring(path.DirectoryPath)
    | summarize by DirectoryPath );
let officedata = (
    CloudAppEvents
    | where RawEventData.Operation == "SecureLinkUsed" or RawEventData.Operation == "AddedToSecureLink"
    | extend ObjectId_ = tostring(RawEventData.ObjectId)
    | where ObjectId_ has_any (DLP)
    | extend SharePointMetaData_SiteCollectionUrl = tostring(RawEventData.SiteUrl)
    | extend SharePointMetaData_FileName = tostring((RawEventData.SourceFileName))
    | extend Account = tostring(RawEventData.UserId)
    | extend Targetsplit = split(Account, "#")
    | extend TargetUserOrGroupName = iff(isempty(RawEventData.TargetUserOrGroupName), Targetsplit[1], RawEventData.TargetUserOrGroupName)
    //Exclude internal domains if required
    //| where TargetUserOrGroupName !has "mydom1.com"
    | extend TargetUserOrGroupName = tolower(TargetUserOrGroupName)
    | summarize
        by
        SharePointMetaData_FileName,
        SharePointMetaData_SiteCollectionUrl,
        TargetUserOrGroupName,
        ObjectId_,
        Account);
let dlpmain = (
    CloudAppEvents
    | where ActionType == "DLPRuleMatch"
    | where RawEventData.Workload == "OneDrive" or RawEventData.Workload == "SharePoint"
    | extend FilePathUrl_ = url_decode(tostring(RawEventData.SharePointMetaData.FilePathUrl))
    | extend path = parse_path(FilePathUrl_)
    | where path.Filename has document
    | extend Account = tostring(RawEventData.UserId)
    | where Account has user
    | extend ObjectId_ = url_decode(tostring(ObjectName))
    | extend SharePointMetaData_SiteCollectionUrl =  tostring((RawEventData.SharePointMetaData.SiteCollectionUrl))
    | where tolower(RawEventData.UserId) == tolower(Account)
    | where RawEventData.Workload in ('SharePoint', 'OneDrive')
    | extend SourceFileName = tostring(RawEventData.SourceFileName)
    );
let folder = (dlpmain
    | join kind = leftouter
        (officedata
            | summarize
            by
            SharePointMetaData_SiteCollectionUrl,
            SharePointMetaData_FileName,
            TargetUserOrGroupName,
            ObjectId_
        | summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName))
            by
            ObjectId_,
            SharePointMetaData_SiteCollectionUrl,
            SharePointMetaData_FileName
        | summarize take_any(TargetUserOrGroupName)
            by
            ObjectId_,
            SharePointMetaData_SiteCollectionUrl,
            SharePointMetaData_FileName
        | join kind = leftouter
            (
            officedata
            | summarize
                by
                SharePointMetaData_SiteCollectionUrl,
                SharePointMetaData_FileName,
                TargetUserOrGroupName,
                ObjectId_
            | extend Domsplit = split(TargetUserOrGroupName, "@")
            | extend domain = Domsplit[1]
            | summarize TargetDomain = make_list(strcat(domain)) by SharePointMetaData_FileName, ObjectId_
            | summarize take_any(TargetDomain) by SharePointMetaData_FileName, ObjectId_
            )
            on ObjectId_
        )
        on ObjectId_
    );
let files = (folder
    //| where TargetUserOrGroupName == ""
    | join kind = leftouter
        (officedata
        | summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName)) by SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
        | summarize take_any(TargetUserOrGroupName) by SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
        | join kind = leftouter
            (
            officedata
            | summarize
                by
                SharePointMetaData_SiteCollectionUrl,
                SharePointMetaData_FileName,
                TargetUserOrGroupName
            | extend Domsplit = split(TargetUserOrGroupName, "@")
            | extend domain = Domsplit[1]
            | summarize TargetDomain = make_list(strcat(domain)) by SharePointMetaData_FileName
            | summarize take_any(TargetDomain) by SharePointMetaData_FileName
            )
            on SharePointMetaData_FileName
        )
        on SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
    | extend TargetUserOrGroupName = TargetUserOrGroupName1
    | extend TargetDomain = TargetDomain1
    | where TargetUserOrGroupName != ""
    );
    union folder, files
| extend FileName = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).FileName)
| summarize arg_max(Timestamp, *) by ObjectId_, Account, tostring(RawEventData.Id)
| extend FileNameEncoded = url_encode(FileName)
| extend MDALink = (strcat('https://security.microsoft.com/cloudapps/files?filename=eq(',FileNameEncoded,')')) //Use this link to view older operations directly from Microsoft Defender for Cloud Apps
| project Timestamp,FileName,Target= TargetUserOrGroupName,SPOD_Path = ObjectId_,RawEventData,MDALink
| sort by Timestamp

 

 

 

 

Understand Applications, Devices used by user in correlation to Sensitive data processed

This query will return endpoint events generated by a specific user. It provides a more complete picture when investigating an endpoint DLP event. Update the query by replacing the user email.

 

 

let User = ("user@dom.com");
let AppEvents = (CloudAppEvents
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != ""
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != "[]"
| extend sit = parse_json(tostring(RawEventData.SensitiveInfoTypeData))
| mv-expand sit
| extend UserId = tolower(RawEventData.UserId)
| extend SensitiveInfoTypeName = tostring(sit.SensitiveInfoTypeName)
| extend document = tostring(RawEventData.ObjectId)
);
AppEvents
| where UserId == User
| extend ObjectId = tostring(RawEventData.ObjectId)
| extend DeviceName = tostring(RawEventData.DeviceName)
| extend Application = tostring(RawEventData.Application)
| summarize count = count() by document,Application,SensitiveInfoTypeName,ActionType,DeviceName
|order by ActionType

 

 

 

 

Identify what actions have been made on a document and associated Sensitivity label.

This query will return all activity across endpoint, SharePoint, Exchange for your document. It also returns the guid of the label applied to the content. Since the same filename can be used for different documents you may need to add additional filters to ensure you only return your file. Update the query by adding the filename.

 

 

let document = "sample.txt";
CloudAppEvents
// for speed of executing the query and remove possible false matches use the filters below.
//| where tolower(RawEventData.UserId) == "youruser@yourdom.com"
//| where RawEventData.Sha256 == "52eb73d06b265f3e6cfeca5477ea22efc300ab42514635ed20d18e77272ff125"
//| where RawEventData.ListItemUniqueId == "6c8306d8-2e84-40a4-b39a-4841bfac1c82"
| search document
| extend Users = tostring(RawEventData.UserId)
 | extend LabelGUID1 = parse_json(tostring(RawEventData.SensitivityLabelEventData)).SensitivityLabelId
    | extend LabelGUID2 = iff(isempty(tostring(RawEventData.LabelId)), LabelGUID1, tostring(RawEventData.LabelId))
    | extend LabelGUID3 = iff(isempty(tostring(RawEventData.SensitivityLabelId)), LabelGUID2, tostring(RawEventData.SensitivityLabelId))
     | extend OldSensitivityLabelId = tostring(parse_json(tostring(RawEventData.SensitivityLabelEventData)).OldSensitivityLabelId)
    | extend LabelGUID = iff(isempty(tostring(LabelGUID3)),OldSensitivityLabelId, tostring(LabelGUID3))  
| project Timestamp,ActionType,Users,CountryCode,LabelGUID,ReportId,RawEventData
| sort by Timestamp

 

 

 

 

Identify people who joined a Teams meeting that violated a policy

This query will return the list of people that joined the teams meeting that you specify. The results from Teams can sometimes come a few hours after the completion. Replace the subject.

 

 

//Update the meeting Subject
    let subject = ("Super Secret meeting");
    CloudAppEvents
    //| where RawEventData.Operation == "MeetingParticipantDetail"
    | where ActionType == "MessageSent"
    | extend ChatName_ = tostring(RawEventData.ChatName)
    | where ChatName_ has subject
| extend ChatThreadId_ = tostring(RawEventData.ChatThreadId)  
| join kind=inner 
    ( 
    CloudAppEvents
        | where RawEventData.Operation == "MeetingDetail"
        | extend ChatThreadId_ = tostring(RawEventData.ChatThreadId)
        | extend meetid = tostring(RawEventData.Id)
        ) on ChatThreadId_
        | extend Meeting = tostring(parse_json(tostring(parse_json(tostring(RawEventData.ExchangeMetaData)).To))[0])
        | extend HasForeignTenantUsers_ = tostring(parse_json(tostring(RawEventData.ParticipantInfo)))
| project meetid,ChatName_,HasForeignTenantUsers_
| join kind=inner
    (    
    CloudAppEvents
    | where RawEventData.Operation == "MeetingParticipantDetail"
    | extend meetid = tostring(RawEventData.MeetingDetailId)
    )
    on meetid
| extend UPN = tostring(parse_json(tostring(RawEventData.Attendees))[0].UPN)
| extend RecipientType = tostring(parse_json(tostring(RawEventData.Attendees))[0].RecipientType)
| summarize by UPN,AccountDisplayName,RecipientType,ChatName_,Timestamp,HasForeignTenantUsers_

 

 

 

 

Identify Access of external users to Teams channel that violated DLP policy

This query will return the details for a Teams channel. It will also inform you if there are external users associated with the channel. Update query by replacing the Teams name.

 

 

//Update the Team used by the Teamvar
let Teamvar = ("The eDiscovery Team");
CloudAppEvents
| where Application == "Microsoft Teams"
| extend TeamName = tostring(RawEventData.TeamName)
| where TeamName == Teamvar
| extend UPN_ = tostring(parse_json(tostring(RawEventData.Members))[0].UPN)
| extend GroupId = tostring(RawEventData.AADGroupId)
//add the below to include addition of users
//| where ActionType == "MemberAdded"
| extend HasGuestUsers = tostring(parse_json(tostring(RawEventData.ParticipantInfo)).HasGuestUsers)
| extend HasUnauthenticatedUsers = tostring(parse_json(tostring(RawEventData.ParticipantInfo)).HasUnauthenticatedUsers)
| summarize by TeamName,ActionType,HasGuestUsers,HasUnauthenticatedUsers,UPN_

 

 

 

 

Type of data processed in different regions

If you have a Sensitive information type that should only be processed in a specific geographical region, you can use this query to identify. We are making use of the location data coming from Microsoft Defender for Cloud  Apps. You can setup an alert based on this as well.

 

 

let AppEvents = (CloudAppEvents
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != ""
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != "[]"
| extend sit = parse_json(tostring(RawEventData.SensitiveInfoTypeData))
//Expand all Sensitive Info types in the separate rows
| mv-expand sit
| extend SensitiveInfoTypeName = tostring(sit.SensitiveInfoTypeName)
| extend document = tostring(RawEventData.ObjectId)
);
    AppEvents
    | extend city = City
    | extend country = CountryCode
    | extend Level = SensitiveInfoTypeName
| summarize count()  by Level,city,country

 

 

 

 

Translation table for Sensitivity Labels for richness of reporting

When working through SharePoint and OneDrive cases it may be good to be able to add the labels or label changes being made to a document. This sample allows you to translate the GUIDs to the actual names of the labels. It requires some more work to get the GUIDs.

  1. Connect to SCC PowerShell (Connect-IPPSsession)
    1. Run get-label | select immutableid,DisplayName,"," | convertto-csv
    2. Copy the Output

 

  1. Replace your output with the sample below in the data table. This allows you to go a bit deeper and identify egress of labelled content with the actual label name. Update the document variable below to the document you are investigating.

 

 

let document = "doc";
let Labels = datatable(LabelGuid:string,LabelName:string)
[
"defa4170-0d19-0005-0000-bc88714345d2","Personal",
"defa4170-0d19-0005-0001-bc88714345d2","Public",
"defa4170-0d19-0005-0002-bc88714345d2","General",
"defa4170-0d19-0005-0003-bc88714345d2","Anyone (unrestricted)",
"defa4170-0d19-0005-0004-bc88714345d2","All Employees (unrestricted)",
"defa4170-0d19-0005-0005-bc88714345d2","Confidential",
"defa4170-0d19-0005-0006-bc88714345d2","Anyone (unrestricted)",
"defa4170-0d19-0005-0007-bc88714345d2","All Employees",
"defa4170-0d19-0005-0008-bc88714345d2","Trusted People",
"6a6067cf-b972-45c0-a5ee-1ac16535698b","Finance",
"defa4170-0d19-0005-0009-bc88714345d2","Highly Confidential",
"defa4170-0d19-0005-000a-bc88714345d2","All Employees",
"defa4170-0d19-0005-000b-bc88714345d2","Specified People",
"44867fba-10b9-43b0-a00b-2672e310b2e3","DKE1",
];
CloudAppEvents
| search document
| extend Users = tostring(RawEventData.UserId)
 | extend LabelGUID1 = parse_json(tostring(RawEventData.SensitivityLabelEventData)).SensitivityLabelId
    | extend LabelGUID2 = iff(isempty(tostring(RawEventData.LabelId)), LabelGUID1, tostring(RawEventData.LabelId))
    | extend LabelGUID3 = iff(isempty(tostring(RawEventData.SensitivityLabelId)), LabelGUID2, tostring(RawEventData.SensitivityLabelId))
     | extend OldSensitivityLabelId = tostring(parse_json(tostring(RawEventData.SensitivityLabelEventData)).OldSensitivityLabelId)
    | extend LabelGuid = iff(isempty(tostring(LabelGUID3)),OldSensitivityLabelId, tostring(LabelGUID3))  
| join kind= inner
(
Labels
)
on LabelGuid
| project Timestamp,ActionType,Users,CountryCode,ReportId,RawEventData,LabelName,RawEventData.ObjectId
| sort by Timestamp

 

 

 

You can setup an alert based on volume of a specific label to determine if a user is moving a high amount of labelled data externally as an example. The parameters to consider are activity type (for instance file upload or file print), # of activities, file size and time range.

 

Use built-in advanced functions to understand anomalies in your DLP data

For anomaly detection and forecasting and machine learning Kusto got that built in too. It can help you analyse for irregular patterns and trigger alerts based on that. It can also be used to validate that the services is behaving as expected. In the sample below we are identifying anomalies in the number of DLP alerts. You can do the same thing on usage of Sensitive content and Labels to find outliers in your processing.

 

 

let data = (CloudAppEvents
| where ActionType == "DlpRuleMatch"
| extend UserId = tostring(RawEventData.UserId)
| summarize EventCount=count() by UserId, bin(Timestamp,30m)
| order by Timestamp
| summarize EventCount=make_list(EventCount),TimeGenerated=make_list(Timestamp) by UserId
| extend outliers=series_decompose_anomalies(EventCount)
| mv-expand TimeGenerated, EventCount, outliers
| where outliers == 1
| distinct UserId);
CloudAppEvents
| extend UserId = tostring(RawEventData.UserId)
| where UserId in (data)
| extend ChartName = strcat(UserId)
| summarize EventCount=count() by ChartName,bin(Timestamp, 30m)
| render timechart

 

 

 

4. Conclusion

Advanced hunting is a powerful tool that can support your day-to-day investigations. These sample queries can help you get started. You can customize these queries to suit your operational environment. Use your knowledge to create custom detection rules and use the built-in functions to hunt and learn more about the risk patterns in your environment. 

 

Additional resources: 

 

As always we look forward to your feedback! 

 

Thank you, 

Microsoft Purview Data Loss Prevention team.  

Updated May 23, 2023
Version 1.0
  • RS1's avatar
    RS1
    Copper Contributor

    There appears to be an issue with the final example "Use built-in advanced functions to understand anomalies in your DLP data". Every time this query is run in my environment it produces a wildly different timechart.

  • Jonhed's avatar
    Jonhed
    Steel Contributor

    RS1 
    What timeframe are you searching?

     

    A query will only return 10,000 results, and the order will be random each time, so if there are too many log results that match the query, you could get different results each execution.

    The query shown as an example will return 48 lines of log will return per user per day for any user that is judged an anomaly, so if you were to search  the last 30 days, you will get 1440 results per user.

     

    a 30 day search could only show a max of 6 users, a 14 day search could show 14 users etc.

    Try a shorter timeframe if you can.

  • RS1's avatar
    RS1
    Copper Contributor

    Thanks - I was trying to do 14 days. If I narrow that down to 7 days then I do get repeatable results.