microsoft purview
513 TopicsDeep Dive: Insider Risk Management in Microsoft Purview
Hi everyone I recently explored the Insider Risk Management (IRM) workflow in Microsoft Purview and how it connects across governance, compliance, and security. This end-to-end process helps organizations detect risky activities, triage alerts, investigate incidents, and take corrective action. Key Phases in the IRM Workflow: Policy: Define rules to detect both accidental (data spillage) and malicious risks (IP theft, fraud, insider trading). Alerts: Generate alerts when policies are violated. Triage: Prioritize and classify alerts by severity. Investigate: Use dashboards, Content Explorer, and Activity Explorer to dig into context. Action: Take remediation steps such as user training, legal escalation, or SIEM integration. Key takeaways from my lab: Transparency is essential (balancing privacy vs. protection). Integration across Microsoft 365 apps makes IRM policies actionable. Defender + Purview together unify detection + governance for insider risk. This was part of my ongoing security lab series. Curious to hear from the community — how are you applying Insider Risk Management in your environments or labs?70Views0likes2CommentsEnhanced eDiscovery Premium APIs
We have been busy working to enable organisations that leverage the Microsoft Purview eDiscovery Graph APIs to benefit from the enhancements in the new modern experience for eDiscovery. I am pleased to share that APIs have now been updated with additional parameters to enable organisations to now benefit from the following features already present in the modern experience within the Purview Portal: Ability to control the export package structure and item naming convention Trigger advanced indexing as part of the Statistics, Add to Review and Export jobs Enables for the first time the ability to trigger HTML transcription of Teams, Viva and Copilot interaction when adding to a review set Benefit from the new statistic options such as Include Categories and Include Keyword Report More granular control of the number of versions collected of modern attachments and documents collected directly collected from OneDrive and SharePoint These changes were communicated as part of the M365 Message Center Post MC1115305. This change involved the beta version of the API calls being promoted into the V1.0 endpoint of the Graph API. The following v1.0 API calls were updated as part of this work: Search Estimate Statistics – ediscoverySearch: estimateStatistics Search Export Report - ediscoverySearch: exportReport Search Export Result - ediscoverySearch: exportResult Search Add to ReviewSet – ediscoveryReviewSet: addToReviewSet ReviewSet Export - ediscoveryReviewSet: export This blog post is intended to walk through the updates to each of these APIs and provide understanding on how to update your calls to these APIs to maintain a consistent outcome (and benefit from the new functionality). If you are new to the Microsoft Purview eDiscovery APIs you can refer to my previous blog post on how to get started with them. Getting started with the eDiscovery APIs | Microsoft Community Hub Wait, but what about the custodian and noncustodial locations workflow in eDiscovery Classic (Premium)? As you are probably aware, in the modern user experience for eDiscovery there have been some changes to the Data Sources tab and how it is used in the workflow. Typically, organisations leveraging the Microsoft Purview eDiscovery APIs previously would have used the custodian and noncustodial data sources APIs to add the relevant data sources to the case using the following APIs. ediscoveryCustodian resource type - Microsoft Graph v1.0 | Microsoft Learn ediscoveryNoncustodialDataSource resource type - Microsoft Graph v1.0 | Microsoft Learn Once added via the API calls, when creating a search these locations would be bound to a search. This workflow in the API remains supported for backwards compatibility. This includes the creation of system generated case hold policies when applying holds to the locations via these APIs. Organisations can continue to use this approach with the APIs. However, to simplify your code and workflow in the APIs consider using the following API call to add additional sources directly to the search. Add additional sources - Microsoft Graph v1.0 | Microsoft Learn Some key things to note if you continue to use the custodian and noncustodial data sources APIs in your automation workflow. This will not populate the new data sources tab in the modern experience for eDiscovery They can continue to be queried via the API calls Advanced indexing triggered via these APIs will have no influence on if advanced indexing is used in jobs triggered from a search Make sure you use the new parameters to trigger advanced indexing on the job when running the Statistics, Add to Review Set and Direct Export jobs Generating Search Statistics ediscoverySearch: estimateStatistics In eDiscovery Premium (Classic) and the previous version of the APIs, generating statistics was a mandatory step before you could progress to either adding the search to a review set or triggering a direct export. With the new modern experience for eDiscovery, this step is completely optional and is not mandatory. For organizations that previously generated search statistics but never checked or used the results before moving to adding the search to a review set or triggering a direct export job, they can now skip this step. If organizations do want to continue to generate statistics, then calling the updated API with the same parameters call will continue to generate statistics for the search. An example of a previous call would look as follows: POST /security/cases/ediscoveryCases/{ediscoveryCaseId}/searches/{ediscoverySearchId}/estimateStatistics Historically this API didn’t require a request body. With the APIs now natively working with the modern experience for eDiscovery; the API call now supports a request body, enabling you to benefit from the new statistic options. Details on these new options can be found in the links below. Create a search for a case in eDiscovery | Microsoft Learn Evaluate and refine search results in eDiscovery | Microsoft Learn If a search is run without a request body it will still generate the following information: Total matches and volume Number of locations searched and the number of locations with hits Number of data sources searched and the number of data sources with hits The top five data sources that make up the most search hits matching your query Hit count by location type (mailbox versus site) As the API is now natively working with the modern experience for eDiscovery you can optionally include a request body to pass the statisticOptions parameter in the POST API call. With the changes to how Advanced Indexing works within the new UX and the additional reporting categories available, you can use the statisticsOptions parameter to trigger the generate statistic job with the additional options within the modern experience for the modern UX. The values you can include are detailed in the table below. Property Option from Portal includeRefiners Include categories: Refine your view to include people, sensitive information types, item types, and errors. includeQueryStats Include query keywords report: Assess keyword relevance for different parts of your search query. includeUnindexedStats Include partially indexed items: We'll provide details about items that weren't fully indexed. These partially indexed items might be unsearchable or partially searchable advancedIndexing Perform advanced indexing on partially indexed items: We'll try to reindex a sample of partially indexed items to determine whether they match your query. After running the query, check the Statistics page to review information about partially indexed items. Note: Can only be used if includeUnindexedStats is also included. locationsWithoutHits Exclude partially indexed items in locations without search hits: Ignore partially indexed items in locations with no matches to the search query. Checking this setting will only return partially indexed items in locations where there is already at least one hit. Note: Can only be used if includeUnindexedStats is also included. In eDiscovery Premium (Classic) the advanced indexing took place when a custodian or non-custodial data location was added to the Data Sources tab. This means that when you triggered the estimate statistics call on the search it would include results from both the native Exchange and SharePoint index as well as the Advanced Index. In the modern experience for eDiscovery, the advanced indexing runs as part of the job. However, this must be selected as an option on the job. Note that not all searches will benefit from advanced indexing, one example would be a simple date range search on a mailbox or SPO site as this will still have hits on the partially indexed items (even partial indexed email and SPO file items have date metadata in the native indexes). The following example using PowerShell and the Microsoft Graph PowerShell module and passes the new StatisticsOptions parameter to the POST call and selects all available options. # Generate estimates for the newly created search $statParams = @{ statisticsOptions = "includeRefiners,includeQueryStats,includeUnindexedStats,advancedIndexing,locationsWithoutHits" } $params = $statParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/estimateStatistics" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Write-Host "Estimate statistics generation triggered for search ID: $searchID" Once run, it will create a generated statistic job with the additional options selected. Direct Export - Report ediscoverySearch: exportReport This API enables you to generate an item report directly form a search without taking the data into a review set or exporting the items that match the search. With the APIs now natively working with the modern experience for eDiscovery, new parameters have been added to the request body as well as new values available for existing parameters. The new parameters are as follows: cloudAttachmentVersion: The versions of cloud attachments to include in messages ( e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when a cloud attachment is contained within a email, teams or viva engage messages. If version shared is configured this is also always returned. documentVersion: The versions of files in SharePoint to include (e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when targeting a SharePoint or OneDrive site directly in the search. These new parameters reflect the changes made in the modern experience for eDiscovery that provides more granular control for eDiscovery managers to apply different collection options based on where the SPO item was collected from (e.g. directly from a SPO site vs a cloud attachment link included in an email). Within eDiscovery Premium (Classic) the All Document Versions option applied to both SharePoint and OneDrive files collected directly from SharePoint and any cloud attachments contained within email, teams and viva engage messages. Historically for this API, within the additionalOptions parameter you could include the allDocumentVersions value to trigger the collection of all versions of any file stored in SharePoint and OneDrive. With the APIs now natively working with the modern experience for eDiscovery, the allDocumentVersions value can still be included in the additionalOptions parameter but it will only apply to files collected directly from a SharePoint or OneDrive site. It will not influence any cloud attachments included in email, teams and viva engage messages. To collect additional versions of cloud attachments use the cloudAttachmentVersion parameter to control the number of versions that are included. Also consider moving from using the allDocumentVersions value in the additionalOptions parameter and switch to using the new documentVersion parameter. As described earlier, to benefit from advanced indexing in the modern experience for eDiscovery, you must trigger advanced indexing as part of the direct export job. Within the portal to include partially indexed items and run advanced indexing you would make the following selections. To achieve this via the API call we need to ensure we include the following parameters and values into the request body of the API call. Parameter Value Option from the portal additionalOptions advancedIndexing Perform advanced indexing on partially indexed items exportCriteria searchHits, partiallyIndexed Indexed items that match your search query and partially indexed items exportLocation responsiveLocations, nonresponsiveLocations Exclude partially indexed items in locations without search hits. Finally, in the new modern experience for eDiscovery more granular control has been introduced to enable organisations to independently choose to convert Teams, Viva Engage and Copilot interactions into HTML transcripts and the ability to collect up to 12 hours of related conversations when a message matches a search. This is reflected in the job settings by the following options: Organize conversations into HTML transcripts Include Teams and Viva Engage conversations In the classic experience this was a single option titled Teams and Yammer Conversations that did both actions and was controlled by including the teamsAndYammerConversations value in the additionalOptions parameter. With the APIs now natively working with the modern experience for eDiscovery, the teamsAndYammerConversations value can still be included in the additionalOptions parameter but it will only trigger the collection of up to 12 hours of related conversations when a message matches a search without converting the items into HTML transcripts. To do this we need to include the new value of htmlTranscripts in the additionalOptions parameter. As an example, lets look at the following direct export report job from the portal and use the Microsoft Graph PowerShell module to call the exportReport API call with the updated request body. $exportName = "New UX - Direct Export Report" $exportParams = @{ displayName = $exportName description = "Direct export report from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportReport" $exportResponse = Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Direct Export - Results ediscoverySearch: exportResult - Microsoft Graph v1.0 | Microsoft Learn This API call enables you to export the items from a search without taking the data into a review set. All the information from the above section on the changes to the exportReport API also applies to this API call. However with this API call we will actually be exporting the items from the search and not just the report. As such we need to pass in the request body information on how we want the export package to look. Previously with direct export for eDiscovery Premium (Classic) you had a three options in the UX and in the API to define the export format. Option Exchange Export Structure SharePoint / OneDrive Export Structure Individual PST files for each mailbox PST created for each mailbox. The structure of each PST is reflective of the folders within the mailbox with emails stored based on their original location in the mailbox. Emails named based on their subject. Folder for each mailbox site. Within each folder, the structure is reflective of the SharePoint/OneDrive site with documents stored based on their original location in the site. Documents are named based on their document name. Individual .msg files for each message Folder created for each mailbox. Within each folder the file structure within is reflective of the folders within the mailbox with emails stored as .msg files based on their original location in the mailbox. Emails named based on their subject. As above. Individual .eml files for each message Folder created for each mailbox. Within each folder the file structure within is reflective of the folder within the mailbox with emails stored as .eml files based on their original location in the mailbox. Emails named based on their subject As above. Historically with this API, the exportFormat parameter was used to control the desired export format. Three values could be used and they were pst, msg and eml. This parameter is still relevant but only controls how email items will be saved, either in a PST file, as individual .msg files or as individual .eml files. Note: The eml export format option is depreciated in the new UX. Going forward you should use either pst or msg. With the APIs now natively working with the modern experience for eDiscovery; we need to account for the additional flexibility customers have to control the structure of their export package. An example of the options available in the direct export job can be seen below. More information on the export package options and what they control can be found in the following link. https://learn.microsoft.com/en-gb/purview/edisc-search-export#export-package-options To support this, new values have been added to the additionalOptions parameter for this API call, these must be included in the request body otherwise the export structure will be as follows. exportFormat value Exchange Export Structure SharePoint / OneDrive Export Structure pst PST files created that containing data from multiple mailboxes. All emails contained within a single folder within the PST. Emails named a based on an assigned unique identifier (GUID) One folder for all documents. All documents contained within a single folder. Documents are named based on an assigned unique identifier (GUID) msg Folder created containing data from all mailboxes. All emails contained within a single folder stored as .msg files. Emails named a based on an assigned unique identifier (GUID) As above. The new values added to the additionalOptions parameters are as follows. They control the export package structure for both Exchange and SharePoint/OneDrive items. Property Option from Portal includeFolderAndPath Organize data from different locations into separate folders or PSTs splitSource Include folder and path of the source condensePaths Condense paths to fit within 259 characters limit friendlyName Give each item a friendly name Organizations are free to mix and match which export options they include in the request body to meet their own organizational requirements. To receive a similar output structure when previously using the pst or msg values in the exportFormat parameter I would include all of the above values in the additionalOptions parameter. For example, to generate a direct export where the email items are stored in separate PSTs per mailbox, the structure of the PST files reflects the mailbox and each items is named as per the subject of the email; I would use the Microsoft Graph PowerShell module to call the exportResults API call with the updated request body. $exportName = "New UX - DirectExportJob - PST" $exportParams = @{ displayName = $exportName description = "Direct export of items from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" exportFormat = "pst" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = “https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportResult" $exportResponse = Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params If I want to export the email items as individual .msg files instead of storing them in PST files; I would use the Microsoft Graph PowerShell module to call the exportResults API call with the updated request body. $exportName = "New UX - DirectExportJob - MSG" $exportParams = @{ displayName = $exportName description = "Direct export of items from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" exportFormat = "msg" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = " https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportResult" Add to Review Set ediscoveryReviewSet: addToReviewSet This API call enables you to commit the items that match the search to a Review Set within an eDiscovery case. This enables you to review, tag, redact and filter the items that match the search without exporting the data from the M365 service boundary. Historically with this API call it was more limited compared to triggering the job via the eDiscovery Premium (Classic) UI. With the APIs now natively working with the modern experience for eDiscovery organizations can make use of the enhancements made within the modern UX and have greater flexibility in selecting the options that are relevant for your requirements. There is a lot of overlap with previous sections, specifically the “Direct Export – Report” section on what updates are required to benefit from updated API. They are as follows: Controlling the number of versions of SPO and OneDrive documents added to the review set via the new cloudAttachmentVersion and documentVersion parameters Enabling organizations to trigger the advanced indexing of partial indexed items during the add to review set job via new values added to existing parameters However there are some nuances to the parameter names and the values for this specific API call compared to the exportReport API call. For example, with this API call we use the additionalDataOptions parameter opposed to the additionalOptions parameter. As with the exportReport and exportResult APIs, there are new parameters to control the number of versions of SPO and OneDrive documents added to the review set are as follows: cloudAttachmentVersion: The versions of cloud attachments to include in messages ( e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when a cloud attachment is contained within a email, teams or viva engage messages. If version shared is configured this is also always returned. documentVersion: The versions of files in SharePoint to include (e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when targeting a SharePoint or OneDrive site directly in the search. Historically for this API call, within the additionalDataOptions parameter you could include the allVersions value to trigger the collection of all versions of any file stored in SharePoint and OneDrive. With the APIs now natively working with the modern experience for eDiscovery, the allVersions value can still be included in the additionalDataOptions parameter but it will only apply to files collected directly from a SharePoint or OneDrive site. It will not influence any cloud attachments included in email, teams and viva engage messages. To collect additional versions of cloud attachments use the cloudAttachmentVersion parameter to control the number of versions that are included. Also consider moving from using the allDocumentVersions value in the additionalDataOptions parameter and switch to using the new documentVersion parameter. To benefit from advanced indexing in the modern experience for eDiscovery, you must trigger advanced indexing as part of the add to review set job. Within the portal to include partially indexed items and run advanced indexing you would make the following selections. To achieve this via the API call we need to ensure we include the following parameters and values into the request body of the API call. Parameter Value Option from the portal additionalDataOptions advancedIndexing Perform advanced indexing on partially indexed items itemsToInclude searchHits, partiallyIndexed Indexed items that match your search query and partially indexed items additionalDataOptions locationsWithoutHits Exclude partially indexed items in locations without search hits. Historically the API call didn’t support the add to review set job options to convert Teams, Viva Engage and Copilot interactions into HTML transcripts and collect up to 12 hours of related conversations when a message matches a search. With the APIs now natively working with the modern experience for eDiscovery this is now possible by adding support for the htmlTranscripts and messageConversationExpansion values to the addtionalDataOptions parameter. As an example, let’s look at the following add to review set job from the portal and use the Microsoft Graph PowerShell module to invoke the addToReviewSet API call with the updated request body. $commitParams = @{ search = @{ id = $searchID } additionalDataOptions = "linkedFiles,advancedIndexing,htmlTranscripts,messageConversationExpansion,locationsWithoutHits" cloudAttachmentVersion = "latest" documentVersion = "latest" itemsToInclude = "searchHits,partiallyIndexed" } $params = $commitParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/addToReviewSet" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Export from Review Set ediscoveryReviewSet: export This API call enables you to export items from a Review Set within an eDiscovery case. Historically with this API, the exportStructure parameter was used to control the desired export format. Two values could be used and they were directory and pst. This parameter has had been updated to include a new value of msg. Note: The directory value is depreciated in the new UX but remains available in v1.0 of the API call for backwards compatibility. Going forward you should use msg alongside the new exportOptions values. The exportStructure parameter will only control how email items are saved, either within PST files or as individual .msg files. With the APIs now natively working with the modern experience for eDiscovery; we need to account for the additional flexibility customers have to control the structure of their export package. An example of the options available in the direct export job can be seen below. As with the exportResults API call for direct export, new values have been added to the exportOptions parameter for this API call. The new values added to the exportOptions parameters are as follows. They control the export package structure for both Exchange and SharePoint/OneDrive items. Property Option from Portal includeFolderAndPath Organize data from different locations into separate folders or PSTs splitSource Include folder and path of the source condensePaths Condense paths to fit within 259 characters limit friendlyName Give each item a friendly name Organizations are free to mix and match which export options they include in the request body to meet their own organizational requirements. To receive an equivalent output structure when previously using the pst value in the exportStructure parameter I would include all of the above values in the exportOptions parameter within the request body. An example using the Microsoft Graph PowerShell module can be found below. $exportName = "ReviewSetExport - PST" $exportParams = @{ outputName = $exportName description = "Exporting all items from the review set" exportOptions = "originalFiles,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportStructure = "pst" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/export" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params To receive an equivalent output structure when previously using the directory value in the exportStructure parameter I would instead use the msg value within the request body. As the condensed directory structure format export all items into a single folder, all named based on uniquely assigned identifier I do not need to include the new values added to the exportOptions parameter. An example using the Microsoft Graph PowerShell module can be found below. An example using the Microsoft Graph PowerShell module can be found below. $exportName = "ReviewSetExport - MSG" $exportParams = @{ outputName = $exportName description = "Exporting all items from the review set" exportOptions = "originalFiles" exportStructure = "msg" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/export" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Continuing to use the directory value in exportStructure will produce the same output as if msg was used. Wrap Up Thank you for your time reading through this post. Hopefully you are now equipped with the information needed to make the most of the new modern experience for eDiscovery when making your Graph API calls.Securing Data with Microsoft Purview IRM + Defender: A Hands-On Lab
Hi everyone I recently explored how Microsoft Purview Insider Risk Management (IRM) integrates with Microsoft Defender to secure sensitive data. This lab demonstrates how these tools work together to identify, investigate, and mitigate insider risks. What I covered in this lab: Set up Insider Risk Management policies in Microsoft Purview Connected Microsoft Defender to monitor risky activities Walkthrough of alerts triggered → triaged → escalated into cases Key governance and compliance insights Key learnings from the lab: Purview IRM policies detect both accidental risks (like data spillage) and malicious ones (IP theft, fraud, insider trading) IRM principles include transparency (balancing privacy vs. protection), configurable policies, integrations across Microsoft 365 apps, and actionable alerts IRM workflow follows: Define policies → Trigger alerts → Triage by severity → Investigate cases (dashboards, Content Explorer, Activity Explorer) → Take action (training, legal escalation, or SIEM integration) Defender + Purview together provide unified coverage: Defender detects and responds to threats, while Purview governs compliance and insider risk This was part of my ongoing series of security labs. Curious to hear from others — how are you approaching Insider Risk Management in your organizations or labs?65Views0likes3CommentsCannot see Data Map and Unified Catalog in the free version of Microsoft Purview
Hey, I am trying to setup a data connection in the free version of Microsoft Purview. However, I cannot see the Data Map and Unified Catalog features. Is this the intended limitation of the free version? Or do I miss something?25Views0likes1CommentUpcoming changes to Microsoft Purview eDiscovery
Today, we are announcing three significant updates to the Microsoft Purview eDiscovery products and services. These updates reinforce our commitment to meeting and exceeding the data security, privacy, and compliance requirements of our customers. To improve security and help protect customers and their data, we have accelerated the timeline for the below changes, which will be enforced by default on May 26. The following features will be retired from the Microsoft Purview portal: Content Search will transition to the new unified Purview eDiscovery experience. The eDiscovery (Standard) classic experience will transition to the new unified Purview eDiscovery experience. The eDiscovery export PowerShell cmdlet parameters will be retired. These updates aim to unify and simplify the eDiscovery user experience in the new Microsoft Purview Portal, while preserving the accessibility and integrity of existing eDiscovery cases. Content Search transition to the new unified Purview eDiscovery experience The classic eDiscovery Content Search solution will be streamlined into the new unified Purview eDiscovery experience. Effective May 26 th , the Content Search solution will no longer be available in the classic Purview portal. Content Search provides administrators with the ability to create compliance searches to investigate data located in Microsoft 365. We hear from customers that the Content Seach tool is used to investigate data privacy concerns, perform legal or incident investigations, validate data classifications, etc. Currently, each compliance search created in the Content Search tool is created outside of the boundaries of a Purview eDiscovery (Standard) case. This means that administrators in Purview Role Groups containing the Compliance Search role can view all Content Searches in their tenant. While the Content Search solution does not enable any additional search permission access, the view of all Content Searches in a customer tenant is not an ideal architecture. Alternatively, when using a Purview eDiscovery case, these administrators only have access to cases in which they are assigned. Customers can now create their new compliance searches within an eDiscovery case using the new unified Purview eDiscovery experience. All content searches in a tenant created prior to May 26, 2025 are now accessible in the new unified Purview eDiscovery experience within a case titled “Content Search”. Although the permissions remain consistent, eDiscovery managers and those with custom permissions will now only be able to view searches from within the eDiscovery cases in which they are assigned, including the “Content Search” case. eDiscovery Standard transition to the new unified Purview eDiscovery experience The classic Purview eDiscovery (Standard) solution experience has transitioned into the new unified Purview eDiscovery experience. Effective May 26 th , the classic Purview eDiscovery (Standard) solution will no longer be available to customers within the classic Purview portal. All existing eDiscovery cases created in the classic purview experience are now available within the new unified Purview eDiscovery experience. Retirement of eDiscovery Export PowerShell Cmdlet parameters The Export parameter within the ComplianceSearchAction eDiscovery PowerShell cmdlets will be retired on May 26, 2025: New-ComplianceSearchAction -Export parameter (and parameters dependent on export such as Report, Retentionreport …) Get-ComplianceSearchAction -Export parameter Set-ComplianceSearchAction -ChangeExportKey parameter We recognize that the removal of the Export parameter may require adjustments to your current workflow process when using Purview eDiscovery (Standard). The remaining Purview eDiscovery PowerShell cmdlets will continue to be supported after May 26 th , 2025: Create and update Compliance Cases New-ComplianceCase, Set-ComplianceCase Create and update Case Holds New-CaseHoldPolicy, Set-CaseHoldPolicy, New-CaseHoldRule, Set-CaseHoldRule Create, update and start Compliance Searches New-ComplianceSearch,Set-ComplianceSearch, Start-ComplianceSearch, Apply Purge action to a Compliance Search New-ComplianceSearchAction -Purge Additionally, if you have a Microsoft 365 E5 license and use eDiscovery (Premium), your organization can script all eDiscovery operations, including export, using the Microsoft Graph eDiscovery APIs. Purview eDiscovery Premium On May 26 th , there will be no changes to the classic Purview eDiscovery (Premium) solution in the classic Purview portal. Cases that were created using the Purview eDiscovery (Premium) classic case experience can also now be accessed in the new unified Purview eDiscovery experience. We recognize that these changes may impact your current processes, and we appreciate your support as we implement these updates. Microsoft runs on trust and protecting your data is our utmost priority. We believe these improvements will provide a more secure and reliable eDiscovery experience. To learn more about the Microsoft Purview eDiscovery solution and become an eDiscovery Ninja, please check out our eDiscovery Ninja Guide at https://aka.ms/eDiscoNinja!What’s New in Microsoft Purview eDiscovery
As organizations continue to navigate increasingly complex compliance and legal landscapes, Microsoft Purview eDiscovery is adapting to address these challenges. New and upcoming enhancements will improve how legal and compliance teams manage cases, streamline workflows, and reduce operational friction. Here’s a look at some newly released features and some others that are coming soon. Enhanced Reporting in Modern eDiscovery The modern eDiscovery experience now offers comprehensive reporting capabilities that capture every action taken within a case. This creates an auditable record of actions, enabling users to manage case activities confidently and accurately. These enhancements are critical for defensibility and audit readiness, and they also help users better understand the outcomes of their searches based on the options and settings they selected. Recent improvements include: Compliance boundary visibility: The Summary.csv report now includes the compliance boundary settings that were applied to the search query. This helps clarify search results, especially when different users have varying access permissions across data locations. Decryption settings: Visibility has been added to the Settings.csv report to indicate if a specific process has Exchange or SharePoint decryption capabilities enabled. This ensures users can verify whether encrypted content can be successfully decrypted during the process of adding to a review set or export. Visibility into Premium feature usage: There is a new value in the Settings.csv report that indicates whether Premium features were enabled for the process. Improved clarity: The Items.csv report now includes an "Added by" column to understand how the item was identified. This column shows if items were included by direct search (IndexedQuery), partial indexing (UnindexedQuery), or advanced indexing (AdvancedIndex). Contextual information: The Summary.csv report helps users understand their exported data by providing contextual explanations. For instance, it explains how the volume of exported data may be greater than the search estimate due to factors such as cloud attachments or multiple versions of SharePoint documents. Copy Search to Hold: Reuse with Confidence Another commonly requested feature that is now available is the ability to create a hold from a search. If your workflow starts with searches before creating holds, you can use the new, “Create a hold” button to create a new hold policy based on that search. Depending on your workflow, it can be effective to start off with a broad, high-level search across the entire data source to quickly surface potentially relevant content and gauge the approximate amount of data that may need review. For example, in a litigation scenario, you may want to search across the entire mailbox and OneDrive of custodians that are of interest. Once the preservation obligation is triggered, this new feature easily allows you to copy your search and create a hold to preserve the content, streamlining your workflow and enhancing efficiency. Whether you begin with a highly targeted search or something broader, this feature reduces duplication of effort and ensures consistency across processes when you want to turn an existing search into a hold. Retry Failed Locations: Easily Address Processing Issues Searches can occasionally encounter issues due to temporarily inaccessible locations. The new “Retry failed locations” feature introduces a simple, automated way to reprocess those issues without restarting the entire job. When you retry the failed locations, the results will be aggregated with the original job to provide a comprehensive overview. This feature is particularly beneficial for administrators, allowing them to efficiently retry the search while ensuring continuity in their workflow. Duplicate Search: Consistency Made Easy Save time and reduce rework by duplicating an existing search with just a few mouse clicks. Whether you're building on a previous query or rerunning it with slight adjustments, this new feature lets you preserve all original parameters, such as conditions, data sources, and locations, so you can move faster with confidence and consistency. Administrators and users simply click the “Duplicate search” button while on an existing search and then rename it. Case-Level Data Source Management: The New Data Sources Tab We recently introduced a new case-level Data Sources view in Purview eDiscovery, allowing data sources to be reused in searches and holds within a case. This enhancement allows case administrators to map and manage all relevant data locations, including mailboxes, SharePoint sites, Teams channels, and more, directly within a case. Once data sources have been added, it greatly simplifies the process of creating searches. Adding data sources within the Data Sources tab enables administrators to select from a list of previously added sources when creating new searches, rather than searching for them each time. This feature is especially useful for frequent searches across the same data sources, which happens often within eDiscovery. With this feature, users can: Visualize all data sources tied to a case. Use data source locations to populate searches and eDiscovery hold policies. Simplify the search creation process for administrators and users. This new functionality empowers teams to make more informed decisions about what to preserve, search, and review. Delete Searches and Search Exports: Keep Your Case Well Organized A newly released feature is one that helps you remove outdated or redundant searches from a case, helping keep the workspace clean and focused. With just a few clicks, you can delete searches that are no longer relevant, helping you stay focused on what matters most without extra clutter. Whether you're tidying up after a project or clearing out test runs, deletion helps keep you organized. hot showing how to delete a search. Another new feature enables users to delete search exports. This functionality is intended to assist with the management of case data and to remove unnecessary or outdated search exports, including sensitive information that is no longer required. Condition Builder Enhancements for Logical Operators (AND, OR, NEAR) The Condition Builder now supports logical operators—AND, OR, and NEAR—all within the same line, or grid. These enhancements empower users to construct more targeted search conditions and offer increased control over the use of phrases, enabling additional options over how terms are combined and matched, and proving greater flexibility to keyword queries. Tenant level control over Premium Features An upcoming enhancement will introduce a tenant-level setting to allow organizations to set the default behavior for new case creation and whether to use Premium features by default. This will provide greater flexibility and control, especially for customers that are in a mixed-license environment. Export Naming and Controlling Size of Exports A couple more upcoming enhancements are intended to give users more control and additional options over the export process, increasing flexibility for legal and compliance teams. First, export packages will include the user-defined export name directly in the download package. This small but impactful change simplifies tracking and association of exported data with specific cases. This is especially useful when managing multiple exports across different cases. Second, a new configuration setting will allow administrators to define a maximum export package size. This gives teams greater control over how data is partitioned, helping to optimize performance and reduce the risk of download issues or browser timeouts during large exports. Takeaways These updates are part of a broader modernization of the Purview eDiscovery experience, which includes a unified user experience, enhanced reporting, and a more streamlined and intuitive workflow for teams managing regulatory inquiries, litigation, or investigations. These enhancements aim to reduce friction, streamline workflows, and accelerate productivity. To learn more about thew new eDiscovery user experience, visit our Microsoft documentation at https://aka.ms/ediscoverydocsnew For more updates on the future of eDiscovery, please check out our product roadmap at https://aka.ms/ediscoveryroadmap To become a Purview eDiscovery Ninja, check out our eDiscovery Ninja Guide at: https://aka.ms/ediscoveryninjaColumn-Level Lineage Visualization Issue for Custom Entities and Processes in Azure Purview
I’m trying to implement column-level lineage for data assets and custom transformation processes in Azure Purview using the Atlas API. I have defined custom typedefs for tables (yellowbrick_table), columns (column), and a process type (custom_data_transformation_process), and I’m uploading entities with composition relationships and detailed column mappings. Although the generated JSON and typedefs appear to be correct according to the Apache Atlas documentation, I’m unable to get the column-level lineage to display properly in the Purview UI Specific Issues: 1. Missing 'Schema' tab in custom table entities (yellowbrick_table): When navigating to the detail page of a yellowbrick_table entity, the 'Schema' tab — where the contained columns should be listed — does not appear. 2. Missing 'Switch to column lineage' button in the lineage view of custom processes: In the 'Lineage' tab of a custom_data_transformation_process entity, the side panel titled 'Columns' displays "No mapped columns found," and there is no button or option to switch to column-level lineage view. This happens even when the columnMapping attribute is correctly populated in the process entity. 3. Error message when trying to edit column mappings in the process lineage panel: If I attempt to edit the column mapping in the lineage side panel of the process, I receive the error: "Unable to map columns for this asset. It's a process type asset that doesn't have a schema." (This is expected, as processes don’t have schemas, but it confirms that the UI is not interpreting the columnMapping for visualization purposes.) Context and Steps Taken: I have followed the Apache Atlas documentation and modeling patterns for lineage: Typedef column: Defined with superTypes: ["Referenceable"]. RelationshipDef table_columns: Defined as a COMPOSITION between DataSet (extended by yellowbrick_table) and column, with cardinality: SET on the column side. Typedef yellowbrick_table: Contains an attribute columns with typeName: "array<column>" and relationshipTypeName: "table_columns". Typedef custom_data_transformation_process: Extends from Process and includes a columnMapping attribute of typeName: "array<string>". Entities Uploaded (JSON): * Table entities include complete definitions of their nested columns in the columns attribute. * Process entities include the columnMapping attribute as a list of JSON strings, where each string represents a DatasetMapping with a nested ColumnMapping that uses only the column names (e.g., "Source": "COLUMN_NAME"). * I’ve tested with different browsers Despite these efforts, the issue persists. I would like to know if there are any additional requirements or known behaviors in the Purview UI regarding column lineage visualization for custom types. Specific Questions: 1. Is there any additional attribute or configuration required in the typedefs or entities to make the 'Schema' tab appear in my custom table entities? 2. Are there any specific requirements for the qualifiedName of tables or columns that could be preventing the column-level lineage from being visualized? 3. Could there be a known issue or limitation in the Purview UI regarding column-level lineage rendering for user-defined asset types? 4. Is there any way to verify on the Purview backend that the column composition relationships and the columnMapping of processes have been correctly indexed?" Annexes: # 1. Define the 'column' type # Ensure the superType for 'column' is 'Referenceable' typedef_payload_column = { "entityDefs": [{ "category": "ENTITY", "name": "column", "description": "Columna lógica para linaje columna a columna", "typeVersion": "1.0", "superTypes": ["Referenceable"], "attributeDefs": [] }] } # response = requests.post(typedef_url, headers=headers, json=typedef_payload_column) # print(f"Estado typedef column: {response.status_code} ({response.text.strip()[:100]}...)") # 2. Define the explicit relationship 'table_columns' # Ensure that the cardinality of 'endDef2' (columns) is 'SET' typedef_payload_table_columns_relationship = { "relationshipDefs": [ { "category": "RELATIONSHIP", "name": "table_columns", "description": "Relación entre una tabla y sus columnas", "typeVersion": "1.0", "superTypes": ["AtlasRelationship"], "endDef1": { "type": "DataSet", "name": "parentTable", "isContainer": True, "cardinality": "SINGLE", "isLegacyAttribute": False }, "endDef2": { "type": "column", "name": "columns", "isContainer": False, "cardinality": "SET", "isLegacyAttribute": False }, "relationshipCategory": "COMPOSITION", "attributeDefs": [] } ] } # response = requests.post(typedef_url, headers=headers, json=typedef_payload_table_columns_relationship) # print(f"Estado relationshipDef table_columns: {response.status_code} ({response.text.strip()[:100]}...)") # 3. Modify the table's typedef to use this relationship # Ensure the 'columns' attribute on the table points to 'table_columns' typedef_payload_yellowbrick_table = { "entityDefs": [{ "category": "ENTITY", "name": "yellowbrick_table", "description": "Tabla en Yellowbrick", "typeVersion": "1.0", "superTypes": ["DataSet"], "attributeDefs": [ { "name": "columns", "typeName": "array<column>", "isOptional": True, "cardinality": "LIST", "valuesMinCount": 0, "valuesMaxCount": -1, "isUnique": False, "isIndexable": False, "includeInNotification": True, "relationshipTypeName": "table_columns" } ] }] } # response = requests.post(typedef_url, headers=headers, json=typedef_payload_yellowbrick_table) # print(f"Estado typedef yellowbrick_table: {response.status_code} ({response.text.strip()[:100]}...)") # 4. Define the custom process type # Ensure the 'columnMapping' attribute is a string array typedef_payload_process = { "entityDefs": [{ "category": "ENTITY", "name": "custom_data_transformation_process", "description": "Proceso de transformación de datos con linaje de columna (Custom)", "typeVersion": "1.0", "superTypes": ["Process"], "attributeDefs": [ { "name": "columnMapping", "typeName": "array<string>", "isOptional": True, "cardinality": "LIST", "valuesMinCount": 0, "valuesMaxCount": -1 } ] }] } # response = requests.post(typedef_url, headers=headers, json=typedef_payload_process) # print(f"Estado typedef custom_data_transformation_process: {response.status_code} ({response.text.strip()[:100]}...)") example of generated json: [ { "typeName": "yellowbrick_table", "guid": "-105", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn", "name": "TBL8_CONOCIMIENTO_CLIENTE", "description": "Tabla origen: TBL8_CONOCIMIENTO_CLIENTE", "columns": [ { "typeName": "column", "guid": "-336", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#CUENTA", "name": "CUENTA", "description": "Columna CUENTA de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-338", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#TIPO_DOCUMENTO", "name": "TIPO_DOCUMENTO", "description": "Columna TIPO_DOCUMENTO de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-340", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#IDENTIFICACION", "name": "IDENTIFICACION", "description": "Columna IDENTIFICACION de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-342", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#NOMBRE_1", "name": "NOMBRE_1", "description": "Columna NOMBRE_1 de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-344", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#APELLIDO_1", "name": "APELLIDO_1", "description": "Columna APELLIDO_1 de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-346", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#APELLIDO_2", "name": "APELLIDO_2", "description": "Columna APELLIDO_2 de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-348", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#GENERO", "name": "GENERO", "description": "Columna GENERO de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-349", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#EDAD", "name": "EDAD", "description": "Columna EDAD de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } }, { "typeName": "column", "guid": "-350", "attributes": { "qualifiedName": "DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn#GENERACION", "name": "GENERACION", "description": "Columna GENERACION de tabla tbl8_conocimiento_cliente", "type": "string", "dataType": "string" } } ] } }, { "typeName": "custom_data_transformation_process", "guid": "-107", "attributes": { "qualifiedName": "linaje_process_from_tbl8_conocimiento_cliente_to_tbl_tmp_clien_conocimiento_cliente_c@yellowbrick_conn", "name": "linaje_tbl8_conocimiento_cliente_to_tbl_tmp_clien_conocimiento_cliente_c", "description": "Proceso que conecta TBL8_CONOCIMIENTO_CLIENTE a TBL_TMP_CLIEN_CONOCIMIENTO_CLIENTE_C", "inputs": [ { "guid": "-105" } ], "outputs": [ { "guid": "-106" } ], "columnMapping": [ "{\"DatasetMapping\": {\"Source\": \"DB_DWH_EXTRACCION.HOGARES.TBL8_CONOCIMIENTO_CLIENTE@yellowbrick_conn\", \"Sink\": \"DB_DWH_STAG.CLIENTES.TBL_TMP_CLIEN_CONOCIMIENTO_CLIENTE_C@yellowbrick_conn\"}, \"ColumnMapping\": [{\"Source\": \"CUENTA\", \"Sink\": \"CUENTA\"}, {\"Source\": \"TIPO_DOCUMENTO\", \"Sink\": \"TIPO_DOCUMENTO\"}, {\"Source\": \"IDENTIFICACION\", \"Sink\": \"IDENTIFICACION\"}, {\"Source\": \"NOMBRE_1\", \"Sink\": \"NOMBRE_1\"}, {\"Source\": \"APELLIDO_1\", \"Sink\": \"APELLIDO_1\"}, {\"Source\": \"APELLIDO_2\", \"Sink\": \"APELLIDO_2\"}, {\"Source\": \"GENERO\", \"Sink\": \"GENERO\"}, {\"Source\": \"EDAD\", \"Sink\": \"EDAD\"}, {\"Source\": \"GENERACION\", \"Sink\": \"GENERACION\"}]}" ] } } ]113Views0likes1CommentSensitivity Auto-labelling via Document Property
Why is this needed? Sensitivity labels are generally relevant within an organisation only. If a file is labelled within one environment and then moved to another environment, sensitivity label content markings may be visible, but by default, the applied sensitivity label will not be understood. This can lead to scenarios where information that has been generated externally is not adequately protected. My favourite analogy for these scenarios is to consider the parallels between receiving sensitive information and unpacking groceries. When unpacking groceries, you might sit your grocery bag on a counter or on the floor next to the pantry. You’ll likely then unpack each item, take a look at it and then decide where to place it. Without looking at an item to determine its correct location, you might place it in the wrong location. Porridge might be safe from the kids on the bottom shelf. If you place items that need to be protected, such as chocolate, on the bottom shelf, it’s not likely to last very long. So, I affectionately refer to information that hasn’t been evaluated as ‘porridge’, as until it has been checked, it will end up on the bottom shelf of the pantry where it is quite accessible. Label-based security controls, such as Data Loss Prevention (DLP) policies using conditions of ‘content contains sensitivity label’ will not apply to these items. To ensure the security of any contained sensitive information, we should look for potential clues to its sensitivity and then utilize these clues to ensure that the contained information is adequately protected - We take a closer look at the ‘porridge’, determine whether it’s an item that needs protection and if so, move it to a higher shelf in the pantry so that it’s out of reach for the kids. Effective use of Purview revolves around the use of ‘know your data’ strategies. We should be using as many methods as possible to try to determine the sensitivity of items. This can include the use of Sensitive Information Types (SITs) containing keyword or pattern-based classifiers, trainable classifiers, Exact Data Match, Document fingerprinting, etc. Matching items via SITs present in the items content can be problematic due to false positives. Keywords like ‘Sensitive’ or ‘Protected’ may be mentioned out of context, such as when referring to a classification or an environment. When classifications have been stamped via a property, it allows us to match via context rather than content. We don’t need to guess at an item’s sensitivity if another system has already established what the item’s classification is. These methods are much less prone to false positives. Why isn’t everyone doing this? Document properties are often not considered in Purview deployments. SharePoint metadata management seems to be a dying artform and most compliance or security resources completing Purview configurations don’t have this skill set. There’s also a lack of understanding of the relevance of checking for item properties. Microsoft haven’t helped as the documentation in this space is somewhat lacking and needs to be unpicked via some aligning DLP guidance (Create a DLP policy to protect documents with FCI or other properties). Many of these configurations will also be tied to regional requirements. Document properties being used by systems where I’m from, in Australia, will likely be very different to those used in other parts of the world. In the following sections, we’ll take a look at applicable use cases and walk through how to enable these configurations. Scenarios for use Labelling via document property isn’t for everyone. If your organisation is new to classification or you don’t have external partners that you collaborate with at higher sensitivity levels, then this likely isn’t for you. For those that collaborate heavily and have a shared classification framework, as is often seen across government, this is a must! This approach will also be highly relevant to multi-tenant organisations or conglomerates where information is regularly shared between environments. The following scenarios are examples of where this configuration will be relevant: 1. Migrating from 3 rd party classification tools If an item has been previously stamped by a 3 rd party classification tool, then evaluating its applied document properties will provide a clear picture of its security classification. These properties can then be used in service-based auto-labelling policies to effectively transition items from 3 rd party tools to Microsoft Purview sensitivity labels. As labels are applied to items, they will be brought into scope of label-based controls. 2. Detecting data spill Data spill is a term that is used to define situations where information that is of a higher than permitted security classification land in an environment. Consider a Microsoft 365 tenant that is approved for the storage of Official information but Top Secret files are uploaded to it. Document properties that align with higher than permitted classifications provide us with an almost guaranteed method of identifying spilled items. Pairing this document property with an auto-labelling policy allows for the application of encryption to lock unauthorized users out of the items. Tools like Content Explorer and eDiscovery can then be used to easily perform cleanup activities. If using document properties and auto-labelling for this purpose, keep in mind that you’ll need to create sensitivity labels for higher than permitted classifications in order to catch spilled items. These labels won’t impact usability as you won’t publish them to users. You will, however, need to publish them to a single user or break glass account so that they’re not ignored by auto-labelling. 3. Blocking access by AI tools If your organization was concerned about items with certain properties applied being accessed by generative AI tools, such as Copilot, you could use Auto-labelling to apply a sensitivity label that restricts EXTRACT permissions. You can find some information on this at Microsoft 365 Copilot data protection architecture | Microsoft Learn. This should be relevant for spilled data, but might also be useful in situations where there are certain records that have been marked via properties and which should not be Copilot accessible. 4. External Microsoft Purview Configurations Sensitivity labels are relevant internally only. A label, in its raw form, is essentially a piece of metadata with an ID (or GUID) that we stamp on pieces of information. These GUIDs are understood by your tenant only. If an item marked with a GUID shows up in another Microsoft 365 tenant, the GUID won’t correspond with any of that tenant’s labels or label-based controls. The art in Microsoft Purview lies in interpreting the sensitivity of items based on content markings and other identifiers, so that data security can be maintained. Document properties applied by Purview, such as ClassificationContentMarkingHeaderText are not relevant to a specific tenant, which makes them portable. We can use these properties to help maintain classifications as items move between environments. 5. Utilizing metadata applied by Records Management solutions Some EDRMS, Records or Content Management solutions will apply properties to items. If an item has been previously managed and then stamped with properties, potentially including a security classification, via one of these systems, we could use this information to inform sensitivity label application. 6. 3 rd party classification tools used externally Even if your organisation hasn’t been using 3rd party classification tools, you should consider that partner organisations, such as other Government departments, might be. Evaluating the properties applied by external organisations to items that you receive will allow you to extend protections to these items. If classification tools like Janus or Titus are used in your geography/industry, then you may want to consider checking for their properties. Regarding the use of auto-classification tools Some organisations, particularly those in Government, will have organisational policies that prevent the use of automatic classification capabilities. These policies are intended to ensure that each item is assessed by an actual person for risk of disclosure rather than via an automated service that could be prone to error. However, when auto-labelling is used to interpret and honour existing classifications, we are lowering rather than raising the risk profile. If the item’s existing classification (applied via property) is ignored, the item will be treated as porridge and is likely to be at risk. If auto-labelling is able to identify a high-risk item and apply the relevant label, it will then be within scope of Purview’s data security controls, including label-based DLP, groups and sites data out of place alerting, and potentially even item encryption. The outcome is that, through the use of auto-labelling, we are able to significantly reduce risk of inappropriate or unintended disclosure. Configuration Process Setting up document property-based auto-labelling is fairly straightforward. We need to setup a managed property and then utilize it an auto-labelling policy. Below, I've split this process into 6 steps: Step 1 – Prepare your files In order to make use of document properties, an item with the properties applied will first need to be indexed by SharePoint. SharePoint will record the properties as ‘crawled properties’, which we’ll then need to convert into ‘managed properties’ to make them useful. If you already have items with the relevant properties stored in SharePoint, then they are likely already indexed. If not, you’ll need to upload or create an item or items with the properties applied. For testing, you’ll want to create a file with each property/value combination so that you can confirm that your auto-labelling policies are all working correctly. This could require quite a few files depending on the number of properties you’re looking for. To kick off your crawled property generation though, you could create or upload a single file with the correct properties applied. For example: In the above, I’ve created properties for ClassificationContentMarkingHeaderText and ClassificationContentMarkingFooterText, which you’ll often see applied by Purview when an item has a sensitivity label content marking applied to it. I’ve also included properties to help identify items classified via JanusSeal, Titus and Objective. Step 2 – Index the files After creating or uploading your file, we then need SharePoint to index it. This should happen fairly quickly depending on the size of your environment. I'd expect to wait sometime between 10 minutes and 24 hrs. If you're not in a hurry, then I'd recommend just checking back the next day. You'll know when this has been completed when you head into SharePoint Admin > Search > Managed Search Schema > Crawled Properties and can find your newly indexed properties: Step 3 – Configure managed properties Next, the properties need to be configured as managed properties. To do this, go to SharePoint Admin > More features > Search > Managed Search Schema > Managed Properties. Create a new managed property and give it a name. Note that there are some character restrictions in naming, but you should be able to get it close to your document property name. Set the property’s type to text, select queryable and retrievable. Under ‘mappings to crawled properties’, choose add mapping, search for and select the property indexed from the file property. Note that the crawled property will have the same name as your document property, so there’s no need to browse through all of them: Repeat this so that you have a managed property for each document property that you want to look for. Step 4 – Configure Auto-labelling policies Next up, create some auto-labelling policies. You’ll need one for each label that you want to apply, not one per property as you can check multiple properties within the one auto-labelling policy. - From within Purview, head to Information Protection > Policies > Auto-labelling policies. - Create a new policy using the custom policy template. - Give your policy an appropriate name (e.g. Label PROTECTED via property). - Select the label that you want to apply (e.g. PROTECTED). - Select SharePoint based services (SharePoint and OneDrive). - Name your auto-labelling rules appropriately (e.g. SPO – Contains PROTECTED property) - Enter your conditions as a long string with property and value separated via a colon and multiple entries separated with a comma. For example: ClassificationContentMarkingHeaderText:PROTECTED,ClassificationContentMarkingFooterText:PROTECTED,Objective-Classification:PROTECTED,PMDisplay:PROTECTED,TitusSEC:PROTECTED Note that the properties that you are referencing are the Managed Property rather than the document property. This will be relevant if your managed property ended up having a different name due to character restrictions. After pasting in your string into the UI, the resultant rule should look something like this: When done, you can either leave your policy in simulation mode or save it and then turn it on from the auto-labelling policies screen. Just be aware of any potential impacts, such as accidently locking users out by automatically deploying a label with encryption configuration. You can reduce any potential impact by targeting your auto-labelling policy at a site or set of sites initially and then expanding its scope after testing. Step 5 - Test Testing your configuration will be as easy as uploading or creating a set of files with the relevant document properties in place. Once uploaded, you’ll need to give SharePoint some time to index the items and then the auto-labelling policy some time to apply sensitivity labels to them. To confirm label application, you can head to the document library where your test files are located and enable the sensitivity column. Files that have been auto-labelled will have their label listed: You could also check for auto-labelling activity in Purview via Activity explorer: Step 6 – Expand into DLP If you’ve spent the time setting up managed properties, then you really should consider capitalizing on them in your DLP configurations. DLP policy conditions can be configured in the same manner that we configured Auto-labelling in Step 3 above. The document property also gives us an anchor for DLP conditions that is independent of an item’s sensitivity label. You may wish to consider the following: DLP policies blocking external sharing of items with certain properties applied. This might be handy for situations where auto-labelling hasn’t yet labelled an item. DLP policies blocking the external sharing of items where the applied sensitivity label doesn’t match the applied document property. This could provide an indication of risky label downgrade. You could extend such policies into Insider Risk Management (IRM) by creating IRM policies that are aligned with the above DLP policies. This will allow for document properties to be considered in user risk calculation, which can inform controls like Adaptive Protection. Here's an example of a policy from the DLP rule summary screen that shows conditions of item contains a label or one of our configured document properties: Thanks for reading and I hope this article has been of use. If you have any questions or feedback, please feel free to reach out.2.1KViews8likes8Comments