1. Introduction
Microsoft 365 Defender is the Microsoft-recommended experience for investigation and remediation of Microsoft Purview Data Loss Prevention (DLP) incidents. Please refer to the step-by-step guided walkthrough published earlier to learn how to investigate Microsoft Purview Data Loss Prevention alert in Microsoft 365 Defender.
As part of the investigative capabilities available in Microsoft 365 Defender, advanced hunting provides the ability to query raw compliance and security data signals generated by Microsoft 365 to proactively detect known and potential risks in your organization as well as visualizing the attack chain. Advanced hunting can boost your investigation workflow and help you learn more about the types of alerts you receive across your estate. This blog provides guidance on how to get started and leverage advanced hunting for Microsoft Purview DLP investigations. We are sharing sample queries for high value scenarios to help you get started.
2. Getting Started
Pre-requisite: Get the necessary CloudAppEvents table, that contains Microsoft Purview data, to show up in Advanced Hunting by following these steps to integrate with Microsoft 365.
- To access Advanced hunting, go to https://security.microsoft.com and click on Advanced hunting on the left navigation pane.
- In Advanced Hunting, select the time range for the query.
- On the left hand side you have all the tables available, the details for each of these tables is available here. In this post, we provide examples on what you can do with the CloudAppEvents table.
- Queries in advanced hunting is based on the KUSTO query language. For an introduction read this.
- Once you have inserted the query click, Run query.
- There are built-in visualizations within the product, to make the data easier to consume. In the below example, we are summarizing the alerts based on sensitive information types and rendering the results as a pie chart. Visualizations like this can help you quickly learn if any of your SITs are overrepresented in the alert flow (see the command | render piechart).
3. Scenarios/Samples using advanced hunting
Get DLP alerts for a certain user (summarized by document, subject and the Sensitive information type matching)
This query will return the DLP alerts for a user and summarize it based on the affected content. This can be used to understand risky activities. Is the user actively attempting to defeat the controls by sending the content in different ways.
Replace the username.
let user = ('user@dom.com');
CloudAppEvents
| where ActionType has "DLPRuleMatch" or ActionType has "DLPRuleUndo"
| where tolower(user) == tolower(RawEventData.UserId)
| extend PolicyName_ = tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].PolicyName)
| extend SensitiveInformationTypeName_ = tostring(parse_json(tostring(parse_json(tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].ConditionsMatched)).SensitiveInformation))[0].SensitiveInformationTypeName)
| extend FileName_ = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).FileName)
| extend RuleName_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].RuleName)
| extend Severity_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].Rules))[0].Severity)
| extend Attachments_ = tostring(parse_json(tostring(RawEventData.Item)).Attachments)
| extend Recipients = strcat("To:",RawEventData.ExchangeMetaData.To, " CC:",RawEventData.ExchangeMetaData.CC," BCC:",RawEventData.ExchangeMetaData.BCC)
| extend SharePointMetaData_SiteCollectionUrl_s = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).SiteCollectionUrl)
| extend Documents = iff(isempty(FileName_), Attachments_, FileName_)
| join kind = leftouter
(
CloudAppEvents
| where RawEventData.Operation == "SecureLinkUsed" or RawEventData.Operation == "AddedToSecureLink"
| extend SourceFileName = tostring(RawEventData.SourceFileName)
| extend SharePointMetaData_SiteCollectionUrl_s = tostring(RawEventData.SiteUrl)
| extend UserId = tostring(RawEventData.UserId)
| extend TargetUserOrGroupName = tostring(RawEventData.TargetUserOrGroupName)
| summarize by SharePointMetaData_SiteCollectionUrl_s,SourceFileName,TargetUserOrGroupName
| summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName)) by SharePointMetaData_SiteCollectionUrl_s,SourceFileName
| summarize take_any(TargetUserOrGroupName) by SourceFileName,SharePointMetaData_SiteCollectionUrl_s
|project SourceFileName, SharePointMetaData_SiteCollectionUrl_s, TargetUserOrGroupName
)
on SharePointMetaData_SiteCollectionUrl_s
| extend TargetUsers = iff(isnotempty(tostring(TargetUserOrGroupName)), tostring(TargetUserOrGroupName), Recipients)
| extend Subject = tostring(parse_json(tostring(RawEventData.ExchangeMetaData)).Subject)
| extend PolicyName_EndPoint = tostring(parse_json(tostring(RawEventData.PolicyDetails))[0].PolicyName)
| extend PolicyName = iff(isempty(PolicyName_), PolicyName_EndPoint, PolicyName_)
| extend Workload = tostring(RawEventData.Workload)
| extend Object = tostring(RawEventData.ObjectId)
| join kind=leftouter
(
CloudAppEvents
| where ActionType has "DLPRuleMatch" or ActionType has "DLPRuleUndo"
| where RawEventData.Workload has "Endpoint"
| extend SensitiveInfoTypeName_ = tostring(parse_json(tostring(parse_json(tostring(RawEventData.EndpointMetaData)).SensitiveInfoTypeData))[0].SensitiveInfoTypeName)
| extend EndPointFile = tostring(RawEventData.ObjectId)
| project EndPointFile,SensitiveInfoTypeName_
)
on $left.Object == $right.EndPointFile
| extend Documents = iff(isempty(Documents), EndPointFile, Documents)
| extend SensitiveInformationType = iff(isempty(SensitiveInformationTypeName_), SensitiveInfoTypeName_, SensitiveInformationTypeName_)
| extend EndpointOperation_ = tostring(parse_json(tostring(RawEventData.EndpointMetaData)).EndpointOperation)
| extend TargetDomain = tostring(parse_json(tostring(RawEventData.EndpointMetaData)).TargetDomain)
| extend Target = iff(isempty(tostring(TargetDomain)), TargetUsers, tostring(TargetDomain))
| project Timestamp,ActionType,Workload,PolicyName,SensitiveInformationType,Documents,Target,Subject,EndpointOperation_,Severity_
| summarize arg_max(Timestamp, *) by Workload,PolicyName,SensitiveInformationType,Documents,Target
| project Timestamp,ActionType,Workload,PolicyName,SensitiveInformationType,Documents,Target,Subject,EndpointOperation_,Severity_
| sort by Timestamp
Identify who has access or has accessed a document in SharePoint or OneDrive for Business
This query will return information from SharePoint Online and OneDrive for Business based on the time range defined in advanced hunting. It will return external access and file sharing history and you can validate whom certain files have been shared with for SharePoint and OneDrive DLP alerts.
Replace the user account with the account that triggered the DLP Policy, replace the filename with the document name.
let document = "filename";
let user = ('useraccount');
let DLP = ( CloudAppEvents
| where RawEventData.Workload in ('SharePoint', 'OneDrive')
| where ActionType == "DLPRuleMatch"
| extend FilePathUrl_ = url_decode(tostring(RawEventData.SharePointMetaData.FilePathUrl))
| extend path = parse_path(FilePathUrl_)
| where path.Filename has document
| extend DirectoryPath = tostring(path.DirectoryPath)
| summarize by DirectoryPath );
let officedata = (
CloudAppEvents
| where RawEventData.Operation == "SecureLinkUsed" or RawEventData.Operation == "AddedToSecureLink"
| extend ObjectId_ = tostring(RawEventData.ObjectId)
| where ObjectId_ has_any (DLP)
| extend SharePointMetaData_SiteCollectionUrl = tostring(RawEventData.SiteUrl)
| extend SharePointMetaData_FileName = tostring((RawEventData.SourceFileName))
| extend Account = tostring(RawEventData.UserId)
| extend Targetsplit = split(Account, "#")
| extend TargetUserOrGroupName = iff(isempty(RawEventData.TargetUserOrGroupName), Targetsplit[1], RawEventData.TargetUserOrGroupName)
//Exclude internal domains if required
//| where TargetUserOrGroupName !has "mydom1.com"
| extend TargetUserOrGroupName = tolower(TargetUserOrGroupName)
| summarize
by
SharePointMetaData_FileName,
SharePointMetaData_SiteCollectionUrl,
TargetUserOrGroupName,
ObjectId_,
Account);
let dlpmain = (
CloudAppEvents
| where ActionType == "DLPRuleMatch"
| where RawEventData.Workload == "OneDrive" or RawEventData.Workload == "SharePoint"
| extend FilePathUrl_ = url_decode(tostring(RawEventData.SharePointMetaData.FilePathUrl))
| extend path = parse_path(FilePathUrl_)
| where path.Filename has document
| extend Account = tostring(RawEventData.UserId)
| where Account has user
| extend ObjectId_ = url_decode(tostring(ObjectName))
| extend SharePointMetaData_SiteCollectionUrl = tostring((RawEventData.SharePointMetaData.SiteCollectionUrl))
| where tolower(RawEventData.UserId) == tolower(Account)
| where RawEventData.Workload in ('SharePoint', 'OneDrive')
| extend SourceFileName = tostring(RawEventData.SourceFileName)
);
let folder = (dlpmain
| join kind = leftouter
(officedata
| summarize
by
SharePointMetaData_SiteCollectionUrl,
SharePointMetaData_FileName,
TargetUserOrGroupName,
ObjectId_
| summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName))
by
ObjectId_,
SharePointMetaData_SiteCollectionUrl,
SharePointMetaData_FileName
| summarize take_any(TargetUserOrGroupName)
by
ObjectId_,
SharePointMetaData_SiteCollectionUrl,
SharePointMetaData_FileName
| join kind = leftouter
(
officedata
| summarize
by
SharePointMetaData_SiteCollectionUrl,
SharePointMetaData_FileName,
TargetUserOrGroupName,
ObjectId_
| extend Domsplit = split(TargetUserOrGroupName, "@")
| extend domain = Domsplit[1]
| summarize TargetDomain = make_list(strcat(domain)) by SharePointMetaData_FileName, ObjectId_
| summarize take_any(TargetDomain) by SharePointMetaData_FileName, ObjectId_
)
on ObjectId_
)
on ObjectId_
);
let files = (folder
//| where TargetUserOrGroupName == ""
| join kind = leftouter
(officedata
| summarize TargetUserOrGroupName = make_list(strcat(TargetUserOrGroupName)) by SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
| summarize take_any(TargetUserOrGroupName) by SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
| join kind = leftouter
(
officedata
| summarize
by
SharePointMetaData_SiteCollectionUrl,
SharePointMetaData_FileName,
TargetUserOrGroupName
| extend Domsplit = split(TargetUserOrGroupName, "@")
| extend domain = Domsplit[1]
| summarize TargetDomain = make_list(strcat(domain)) by SharePointMetaData_FileName
| summarize take_any(TargetDomain) by SharePointMetaData_FileName
)
on SharePointMetaData_FileName
)
on SharePointMetaData_FileName, SharePointMetaData_SiteCollectionUrl
| extend TargetUserOrGroupName = TargetUserOrGroupName1
| extend TargetDomain = TargetDomain1
| where TargetUserOrGroupName != ""
);
union folder, files
| extend FileName = tostring(parse_json(tostring(RawEventData.SharePointMetaData)).FileName)
| summarize arg_max(Timestamp, *) by ObjectId_, Account, tostring(RawEventData.Id)
| extend FileNameEncoded = url_encode(FileName)
| extend MDALink = (strcat('https://security.microsoft.com/cloudapps/files?filename=eq(',FileNameEncoded,')')) //Use this link to view older operations directly from Microsoft Defender for Cloud Apps
| project Timestamp,FileName,Target= TargetUserOrGroupName,SPOD_Path = ObjectId_,RawEventData,MDALink
| sort by Timestamp
Understand Applications, Devices used by user in correlation to Sensitive data processed
This query will return endpoint events generated by a specific user. It provides a more complete picture when investigating an endpoint DLP event. Update the query by replacing the user email.
let User = ("user@dom.com");
let AppEvents = (CloudAppEvents
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != ""
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != "[]"
| extend sit = parse_json(tostring(RawEventData.SensitiveInfoTypeData))
| mv-expand sit
| extend UserId = tolower(RawEventData.UserId)
| extend SensitiveInfoTypeName = tostring(sit.SensitiveInfoTypeName)
| extend document = tostring(RawEventData.ObjectId)
);
AppEvents
| where UserId == User
| extend ObjectId = tostring(RawEventData.ObjectId)
| extend DeviceName = tostring(RawEventData.DeviceName)
| extend Application = tostring(RawEventData.Application)
| summarize count = count() by document,Application,SensitiveInfoTypeName,ActionType,DeviceName
|order by ActionType
Identify what actions have been made on a document and associated Sensitivity label.
This query will return all activity across endpoint, SharePoint, Exchange for your document. It also returns the guid of the label applied to the content. Since the same filename can be used for different documents you may need to add additional filters to ensure you only return your file. Update the query by adding the filename.
let document = "sample.txt";
CloudAppEvents
// for speed of executing the query and remove possible false matches use the filters below.
//| where tolower(RawEventData.UserId) == "youruser@yourdom.com"
//| where RawEventData.Sha256 == "52eb73d06b265f3e6cfeca5477ea22efc300ab42514635ed20d18e77272ff125"
//| where RawEventData.ListItemUniqueId == "6c8306d8-2e84-40a4-b39a-4841bfac1c82"
| search document
| extend Users = tostring(RawEventData.UserId)
| extend LabelGUID1 = parse_json(tostring(RawEventData.SensitivityLabelEventData)).SensitivityLabelId
| extend LabelGUID2 = iff(isempty(tostring(RawEventData.LabelId)), LabelGUID1, tostring(RawEventData.LabelId))
| extend LabelGUID3 = iff(isempty(tostring(RawEventData.SensitivityLabelId)), LabelGUID2, tostring(RawEventData.SensitivityLabelId))
| extend OldSensitivityLabelId = tostring(parse_json(tostring(RawEventData.SensitivityLabelEventData)).OldSensitivityLabelId)
| extend LabelGUID = iff(isempty(tostring(LabelGUID3)),OldSensitivityLabelId, tostring(LabelGUID3))
| project Timestamp,ActionType,Users,CountryCode,LabelGUID,ReportId,RawEventData
| sort by Timestamp
Identify people who joined a Teams meeting that violated a policy
This query will return the list of people that joined the teams meeting that you specify. The results from Teams can sometimes come a few hours after the completion. Replace the subject.
//Update the meeting Subject
let subject = ("Super Secret meeting");
CloudAppEvents
//| where RawEventData.Operation == "MeetingParticipantDetail"
| where ActionType == "MessageSent"
| extend ChatName_ = tostring(RawEventData.ChatName)
| where ChatName_ has subject
| extend ChatThreadId_ = tostring(RawEventData.ChatThreadId)
| join kind=inner
(
CloudAppEvents
| where RawEventData.Operation == "MeetingDetail"
| extend ChatThreadId_ = tostring(RawEventData.ChatThreadId)
| extend meetid = tostring(RawEventData.Id)
) on ChatThreadId_
| extend Meeting = tostring(parse_json(tostring(parse_json(tostring(RawEventData.ExchangeMetaData)).To))[0])
| extend HasForeignTenantUsers_ = tostring(parse_json(tostring(RawEventData.ParticipantInfo)))
| project meetid,ChatName_,HasForeignTenantUsers_
| join kind=inner
(
CloudAppEvents
| where RawEventData.Operation == "MeetingParticipantDetail"
| extend meetid = tostring(RawEventData.MeetingDetailId)
)
on meetid
| extend UPN = tostring(parse_json(tostring(RawEventData.Attendees))[0].UPN)
| extend RecipientType = tostring(parse_json(tostring(RawEventData.Attendees))[0].RecipientType)
| summarize by UPN,AccountDisplayName,RecipientType,ChatName_,Timestamp,HasForeignTenantUsers_
Identify Access of external users to Teams channel that violated DLP policy
This query will return the details for a Teams channel. It will also inform you if there are external users associated with the channel. Update query by replacing the Teams name.
//Update the Team used by the Teamvar
let Teamvar = ("The eDiscovery Team");
CloudAppEvents
| where Application == "Microsoft Teams"
| extend TeamName = tostring(RawEventData.TeamName)
| where TeamName == Teamvar
| extend UPN_ = tostring(parse_json(tostring(RawEventData.Members))[0].UPN)
| extend GroupId = tostring(RawEventData.AADGroupId)
//add the below to include addition of users
//| where ActionType == "MemberAdded"
| extend HasGuestUsers = tostring(parse_json(tostring(RawEventData.ParticipantInfo)).HasGuestUsers)
| extend HasUnauthenticatedUsers = tostring(parse_json(tostring(RawEventData.ParticipantInfo)).HasUnauthenticatedUsers)
| summarize by TeamName,ActionType,HasGuestUsers,HasUnauthenticatedUsers,UPN_
Type of data processed in different regions
If you have a Sensitive information type that should only be processed in a specific geographical region, you can use this query to identify. We are making use of the location data coming from Microsoft Defender for Cloud Apps. You can setup an alert based on this as well.
let AppEvents = (CloudAppEvents
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != ""
| where parse_json(tostring(RawEventData.SensitiveInfoTypeData)) != "[]"
| extend sit = parse_json(tostring(RawEventData.SensitiveInfoTypeData))
//Expand all Sensitive Info types in the separate rows
| mv-expand sit
| extend SensitiveInfoTypeName = tostring(sit.SensitiveInfoTypeName)
| extend document = tostring(RawEventData.ObjectId)
);
AppEvents
| extend city = City
| extend country = CountryCode
| extend Level = SensitiveInfoTypeName
| summarize count() by Level,city,country
Translation table for Sensitivity Labels for richness of reporting
When working through SharePoint and OneDrive cases it may be good to be able to add the labels or label changes being made to a document. This sample allows you to translate the GUIDs to the actual names of the labels. It requires some more work to get the GUIDs.
- Connect to SCC PowerShell (Connect-IPPSsession)
- Run get-label | select immutableid,DisplayName,"," | convertto-csv
- Copy the Output
- Replace your output with the sample below in the data table. This allows you to go a bit deeper and identify egress of labelled content with the actual label name. Update the document variable below to the document you are investigating.
let document = "doc";
let Labels = datatable(LabelGuid:string,LabelName:string)
[
"defa4170-0d19-0005-0000-bc88714345d2","Personal",
"defa4170-0d19-0005-0001-bc88714345d2","Public",
"defa4170-0d19-0005-0002-bc88714345d2","General",
"defa4170-0d19-0005-0003-bc88714345d2","Anyone (unrestricted)",
"defa4170-0d19-0005-0004-bc88714345d2","All Employees (unrestricted)",
"defa4170-0d19-0005-0005-bc88714345d2","Confidential",
"defa4170-0d19-0005-0006-bc88714345d2","Anyone (unrestricted)",
"defa4170-0d19-0005-0007-bc88714345d2","All Employees",
"defa4170-0d19-0005-0008-bc88714345d2","Trusted People",
"6a6067cf-b972-45c0-a5ee-1ac16535698b","Finance",
"defa4170-0d19-0005-0009-bc88714345d2","Highly Confidential",
"defa4170-0d19-0005-000a-bc88714345d2","All Employees",
"defa4170-0d19-0005-000b-bc88714345d2","Specified People",
"44867fba-10b9-43b0-a00b-2672e310b2e3","DKE1",
];
CloudAppEvents
| search document
| extend Users = tostring(RawEventData.UserId)
| extend LabelGUID1 = parse_json(tostring(RawEventData.SensitivityLabelEventData)).SensitivityLabelId
| extend LabelGUID2 = iff(isempty(tostring(RawEventData.LabelId)), LabelGUID1, tostring(RawEventData.LabelId))
| extend LabelGUID3 = iff(isempty(tostring(RawEventData.SensitivityLabelId)), LabelGUID2, tostring(RawEventData.SensitivityLabelId))
| extend OldSensitivityLabelId = tostring(parse_json(tostring(RawEventData.SensitivityLabelEventData)).OldSensitivityLabelId)
| extend LabelGuid = iff(isempty(tostring(LabelGUID3)),OldSensitivityLabelId, tostring(LabelGUID3))
| join kind= inner
(
Labels
)
on LabelGuid
| project Timestamp,ActionType,Users,CountryCode,ReportId,RawEventData,LabelName,RawEventData.ObjectId
| sort by Timestamp
You can setup an alert based on volume of a specific label to determine if a user is moving a high amount of labelled data externally as an example. The parameters to consider are activity type (for instance file upload or file print), # of activities, file size and time range.
Use built-in advanced functions to understand anomalies in your DLP data
For anomaly detection and forecasting and machine learning Kusto got that built in too. It can help you analyse for irregular patterns and trigger alerts based on that. It can also be used to validate that the services is behaving as expected. In the sample below we are identifying anomalies in the number of DLP alerts. You can do the same thing on usage of Sensitive content and Labels to find outliers in your processing.
let data = (CloudAppEvents
| where ActionType == "DlpRuleMatch"
| extend UserId = tostring(RawEventData.UserId)
| summarize EventCount=count() by UserId, bin(Timestamp,30m)
| order by Timestamp
| summarize EventCount=make_list(EventCount),TimeGenerated=make_list(Timestamp) by UserId
| extend outliers=series_decompose_anomalies(EventCount)
| mv-expand TimeGenerated, EventCount, outliers
| where outliers == 1
| distinct UserId);
CloudAppEvents
| extend UserId = tostring(RawEventData.UserId)
| where UserId in (data)
| extend ChartName = strcat(UserId)
| summarize EventCount=count() by ChartName,bin(Timestamp, 30m)
| render timechart
4. Conclusion
Advanced hunting is a powerful tool that can support your day-to-day investigations. These sample queries can help you get started. You can customize these queries to suit your operational environment. Use your knowledge to create custom detection rules and use the built-in functions to hunt and learn more about the risk patterns in your environment.
Additional resources:
- Learn about data loss prevention - Microsoft Purview (compliance) | Microsoft Learn
- Investigate data loss incidents with Microsoft 365 Defender | Microsoft Learn
- Overview - Advanced hunting | Microsoft Learn
- Guidance for investigating Microsoft Purview Data Loss Prevention incidents - Microsoft Community Hub
- Learn how to investigate Microsoft Purview Data Loss Prevention alerts in Microsoft 365 Defender - Microsoft Community Hub
As always we look forward to your feedback!
Thank you,
Microsoft Purview Data Loss Prevention team.