microsoft purview
246 TopicsIntroducing eDiscovery Graph API Standard and Enhancements to Premium APIs
We have been busy working to enable organisations that leverage the Microsoft Purview eDiscovery Graph APIs to benefit from the enhancements in the new modern experience for eDiscovery. I am pleased to share that APIs have now been updated with additional parameters to enable organisations to now benefit from the following features already present in the modern experience within the Purview Portal: Ability to control the export package structure and item naming convention Trigger advanced indexing as part of the Statistics, Add to Review and Export jobs Enables for the first time the ability to trigger HTML transcription of Teams, Viva and Copilot interaction when adding to a review set Benefit from the new statistic options such as Include Categories and Include Keyword Report More granular control of the number of versions collected of modern attachments and documents collected directly collected from OneDrive and SharePoint These changes were communicated as part of the M365 Message Center Post MC1115305. This change involved the beta version of the API calls being promoted into the V1.0 endpoint of the Graph API. The following v1.0 API calls were updated as part of this work: Search Estimate Statistics – ediscoverySearch: estimateStatistics Search Export Report - ediscoverySearch: exportReport Search Export Result - ediscoverySearch: exportResult Search Add to ReviewSet – ediscoveryReviewSet: addToReviewSet ReviewSet Export - ediscoveryReviewSet: export The majority of this blog post is intended to walk through the updates to each of these APIs and provide understanding on how to update your calls to these APIs to maintain a consistent outcome (and benefit from the new functionality). If you are new to the Microsoft Purview eDiscovery APIs you can refer to my previous blog post on how to get started with them. Getting started with the eDiscovery APIs | Microsoft Community Hub First up though, availability of the Graph API for E3 customers We are excited to announce that starting September 9, 2025, Microsoft will launch the eDiscovery Graph API Standard, a new offering designed to empower Microsoft 365 E3 customers with secure, automated data export capabilities. The new eDiscovery Graph API offers scalable, automated exports with secure credential management, improved performance and reliability for Microsoft 365 E3 customers. The new API enables automation of the search, collect, hold, and export flow from Microsoft Purview eDiscovery. While it doesn’t include premium features like Teams/Yammer conversations or advanced indexing (available only with the Premium Graph APIs), it delivers meaningful value for Microsoft 365 E3 customers needing to automate structured legal exports. Key capabilities: Export from Exchange, SharePoint, Teams, Viva Engage and OneDrive for Business Case, search, hold and export management Integration with partner/vendor workflows Support automation that takes advantage of new features within the modern user experience Pricing & Access Microsoft will offer 50 GB of included export volume per tenant per month, with additional usage billed at $10/GB—a price point that balances customer value, sustainability, and market competitiveness. The Graph API Standard will be available in public preview starting September 9. For more details on pay-as-you-go features in eDiscovery and Purview refer to the following links. Billing in eDiscovery | Microsoft Learn Enable Microsoft Purview pay-as-you-go features via subscription | Microsoft Learn Wait, but what about the custodian and noncustodial locations workflow in eDiscovery Classic (Premium)? As you are probably aware, in the modern user experience for eDiscovery there have been some changes to the Data Sources tab and how it is used in the workflow. Typically, organisations leveraging the Microsoft Purview eDiscovery APIs previously would have used the custodian and noncustodial data sources APIs to add the relevant data sources to the case using the following APIs. ediscoveryCustodian resource type - Microsoft Graph v1.0 | Microsoft Learn ediscoveryNoncustodialDataSource resource type - Microsoft Graph v1.0 | Microsoft Learn Once added via the API calls, when creating a search these locations would be bound to a search. This workflow in the API remains supported for backwards compatibility. This includes the creation of system generated case hold policies when applying holds to the locations via these APIs. Organisations can continue to use this approach with the APIs. However, to simplify your code and workflow in the APIs consider using the following API call to add additional sources directly to the search. Add additional sources - Microsoft Graph v1.0 | Microsoft Learn Some key things to note if you continue to use the custodian and noncustodial data sources APIs in your automation workflow. This will not populate the new data sources tab in the modern experience for eDiscovery They can continue to be queried via the API calls Advanced indexing triggered via these APIs will have no influence on if advanced indexing is used in jobs triggered from a search Make sure you use the new parameters to trigger advanced indexing on the job when running the Statistics, Add to Review Set and Direct Export jobs Generating Search Statistics ediscoverySearch: estimateStatistics In eDiscovery Premium (Classic) and the previous version of the APIs, generating statistics was a mandatory step before you could progress to either adding the search to a review set or triggering a direct export. With the new modern experience for eDiscovery, this step is completely optional and is not mandatory. For organizations that previously generated search statistics but never checked or used the results before moving to adding the search to a review set or triggering a direct export job, they can now skip this step. If organizations do want to continue to generate statistics, then calling the updated API with the same parameters call will continue to generate statistics for the search. An example of a previous call would look as follows: POST /security/cases/ediscoveryCases/{ediscoveryCaseId}/searches/{ediscoverySearchId}/estimateStatistics Historically this API didn’t require a request body. With the APIs now natively working with the modern experience for eDiscovery; the API call now supports a request body, enabling you to benefit from the new statistic options. Details on these new options can be found in the links below. Create a search for a case in eDiscovery | Microsoft Learn Evaluate and refine search results in eDiscovery | Microsoft Learn If a search is run without a request body it will still generate the following information: Total matches and volume Number of locations searched and the number of locations with hits Number of data sources searched and the number of data sources with hits The top five data sources that make up the most search hits matching your query Hit count by location type (mailbox versus site) As the API is now natively working with the modern experience for eDiscovery you can optionally include a request body to pass the statisticOptions parameter in the POST API call. With the changes to how Advanced Indexing works within the new UX and the additional reporting categories available, you can use the statisticsOptions parameter to trigger the generate statistic job with the additional options within the modern experience for the modern UX. The values you can include are detailed in the table below. Property Option from Portal includeRefiners Include categories: Refine your view to include people, sensitive information types, item types, and errors. includeQueryStats Include query keywords report: Assess keyword relevance for different parts of your search query. includeUnindexedStats Include partially indexed items: We'll provide details about items that weren't fully indexed. These partially indexed items might be unsearchable or partially searchable advancedIndexing Perform advanced indexing on partially indexed items: We'll try to reindex a sample of partially indexed items to determine whether they match your query. After running the query, check the Statistics page to review information about partially indexed items. Note: Can only be used if includeUnindexedStats is also included. locationsWithoutHits Exclude partially indexed items in locations without search hits: Ignore partially indexed items in locations with no matches to the search query. Checking this setting will only return partially indexed items in locations where there is already at least one hit. Note: Can only be used if includeUnindexedStats is also included. In eDiscovery Premium (Classic) the advanced indexing took place when a custodian or non-custodial data location was added to the Data Sources tab. This means that when you triggered the estimate statistics call on the search it would include results from both the native Exchange and SharePoint index as well as the Advanced Index. In the modern experience for eDiscovery, the advanced indexing runs as part of the job. However, this must be selected as an option on the job. Note that not all searches will benefit from advanced indexing, one example would be a simple date range search on a mailbox or SPO site as this will still have hits on the partially indexed items (even partial indexed email and SPO file items have date metadata in the native indexes). The following example using PowerShell and the Microsoft Graph PowerShell module and passes the new StatisticsOptions parameter to the POST call and selects all available options. # Generate estimates for the newly created search $statParams = @{ statisticsOptions = "includeRefiners,includeQueryStats,includeUnindexedStats,advancedIndexing,locationsWithoutHits" } $params = $statParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/estimateStatistics" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Write-Host "Estimate statistics generation triggered for search ID: $searchID" Once run, it will create a generated statistic job with the additional options selected. Direct Export - Report ediscoverySearch: exportReport This API enables you to generate an item report directly form a search without taking the data into a review set or exporting the items that match the search. With the APIs now natively working with the modern experience for eDiscovery, new parameters have been added to the request body as well as new values available for existing parameters. The new parameters are as follows: cloudAttachmentVersion: The versions of cloud attachments to include in messages ( e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when a cloud attachment is contained within a email, teams or viva engage messages. If version shared is configured this is also always returned. documentVersion: The versions of files in SharePoint to include (e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when targeting a SharePoint or OneDrive site directly in the search. These new parameters reflect the changes made in the modern experience for eDiscovery that provides more granular control for eDiscovery managers to apply different collection options based on where the SPO item was collected from (e.g. directly from a SPO site vs a cloud attachment link included in an email). Within eDiscovery Premium (Classic) the All Document Versions option applied to both SharePoint and OneDrive files collected directly from SharePoint and any cloud attachments contained within email, teams and viva engage messages. Historically for this API, within the additionalOptions parameter you could include the allDocumentVersions value to trigger the collection of all versions of any file stored in SharePoint and OneDrive. With the APIs now natively working with the modern experience for eDiscovery, the allDocumentVersions value can still be included in the additionalOptions parameter but it will only apply to files collected directly from a SharePoint or OneDrive site. It will not influence any cloud attachments included in email, teams and viva engage messages. To collect additional versions of cloud attachments use the cloudAttachmentVersion parameter to control the number of versions that are included. Also consider moving from using the allDocumentVersions value in the additionalOptions parameter and switch to using the new documentVersion parameter. As described earlier, to benefit from advanced indexing in the modern experience for eDiscovery, you must trigger advanced indexing as part of the direct export job. Within the portal to include partially indexed items and run advanced indexing you would make the following selections. To achieve this via the API call we need to ensure we include the following parameters and values into the request body of the API call. Parameter Value Option from the portal additionalOptions advancedIndexing Perform advanced indexing on partially indexed items exportCriteria searchHits, partiallyIndexed Indexed items that match your search query and partially indexed items exportLocation responsiveLocations, nonresponsiveLocations Exclude partially indexed items in locations without search hits. Finally, in the new modern experience for eDiscovery more granular control has been introduced to enable organisations to independently choose to convert Teams, Viva Engage and Copilot interactions into HTML transcripts and the ability to collect up to 12 hours of related conversations when a message matches a search. This is reflected in the job settings by the following options: Organize conversations into HTML transcripts Include Teams and Viva Engage conversations In the classic experience this was a single option titled Teams and Yammer Conversations that did both actions and was controlled by including the teamsAndYammerConversations value in the additionalOptions parameter. With the APIs now natively working with the modern experience for eDiscovery, the teamsAndYammerConversations value can still be included in the additionalOptions parameter but it will only trigger the collection of up to 12 hours of related conversations when a message matches a search without converting the items into HTML transcripts. To do this we need to include the new value of htmlTranscripts in the additionalOptions parameter. As an example, lets look at the following direct export report job from the portal and use the Microsoft Graph PowerShell module to call the exportReport API call with the updated request body. $exportName = "New UX - Direct Export Report" $exportParams = @{ displayName = $exportName description = "Direct export report from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportReport" $exportResponse = Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Direct Export - Results ediscoverySearch: exportResult - Microsoft Graph v1.0 | Microsoft Learn This API call enables you to export the items from a search without taking the data into a review set. All the information from the above section on the changes to the exportReport API also applies to this API call. However with this API call we will actually be exporting the items from the search and not just the report. As such we need to pass in the request body information on how we want the export package to look. Previously with direct export for eDiscovery Premium (Classic) you had a three options in the UX and in the API to define the export format. Option Exchange Export Structure SharePoint / OneDrive Export Structure Individual PST files for each mailbox PST created for each mailbox. The structure of each PST is reflective of the folders within the mailbox with emails stored based on their original location in the mailbox. Emails named based on their subject. Folder for each mailbox site. Within each folder, the structure is reflective of the SharePoint/OneDrive site with documents stored based on their original location in the site. Documents are named based on their document name. Individual .msg files for each message Folder created for each mailbox. Within each folder the file structure within is reflective of the folders within the mailbox with emails stored as .msg files based on their original location in the mailbox. Emails named based on their subject. As above. Individual .eml files for each message Folder created for each mailbox. Within each folder the file structure within is reflective of the folder within the mailbox with emails stored as .eml files based on their original location in the mailbox. Emails named based on their subject As above. Historically with this API, the exportFormat parameter was used to control the desired export format. Three values could be used and they were pst, msg and eml. This parameter is still relevant but only controls how email items will be saved, either in a PST file, as individual .msg files or as individual .eml files. Note: The eml export format option is depreciated in the new UX. Going forward you should use either pst or msg. With the APIs now natively working with the modern experience for eDiscovery; we need to account for the additional flexibility customers have to control the structure of their export package. An example of the options available in the direct export job can be seen below. More information on the export package options and what they control can be found in the following link. https://learn.microsoft.com/en-gb/purview/edisc-search-export#export-package-options To support this, new values have been added to the additionalOptions parameter for this API call, these must be included in the request body otherwise the export structure will be as follows. exportFormat value Exchange Export Structure SharePoint / OneDrive Export Structure pst PST files created that containing data from multiple mailboxes. All emails contained within a single folder within the PST. Emails named a based on an assigned unique identifier (GUID) One folder for all documents. All documents contained within a single folder. Documents are named based on an assigned unique identifier (GUID) msg Folder created containing data from all mailboxes. All emails contained within a single folder stored as .msg files. Emails named a based on an assigned unique identifier (GUID) As above. The new values added to the additionalOptions parameters are as follows. They control the export package structure for both Exchange and SharePoint/OneDrive items. Property Option from Portal includeFolderAndPath Organize data from different locations into separate folders or PSTs splitSource Include folder and path of the source condensePaths Condense paths to fit within 259 characters limit friendlyName Give each item a friendly name Organizations are free to mix and match which export options they include in the request body to meet their own organizational requirements. To receive a similar output structure when previously using the pst or msg values in the exportFormat parameter I would include all of the above values in the additionalOptions parameter. For example, to generate a direct export where the email items are stored in separate PSTs per mailbox, the structure of the PST files reflects the mailbox and each items is named as per the subject of the email; I would use the Microsoft Graph PowerShell module to call the exportResults API call with the updated request body. $exportName = "New UX - DirectExportJob - PST" $exportParams = @{ displayName = $exportName description = "Direct export of items from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" exportFormat = "pst" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = “https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportResult" $exportResponse = Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params If I want to export the email items as individual .msg files instead of storing them in PST files; I would use the Microsoft Graph PowerShell module to call the exportResults API call with the updated request body. $exportName = "New UX - DirectExportJob - MSG" $exportParams = @{ displayName = $exportName description = "Direct export of items from the search" additionalOptions = "teamsAndYammerConversations,cloudAttachments,htmlTranscripts,advancedIndexing,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportCriteria = "searchHits,partiallyIndexed" documentVersion = "recent10" cloudAttachmentVersion = "recent10" exportLocation = "responsiveLocations" exportFormat = "msg" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = " https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/searches/$searchID/exportResult" Add to Review Set ediscoveryReviewSet: addToReviewSet This API call enables you to commit the items that match the search to a Review Set within an eDiscovery case. This enables you to review, tag, redact and filter the items that match the search without exporting the data from the M365 service boundary. Historically with this API call it was more limited compared to triggering the job via the eDiscovery Premium (Classic) UI. With the APIs now natively working with the modern experience for eDiscovery organizations can make use of the enhancements made within the modern UX and have greater flexibility in selecting the options that are relevant for your requirements. There is a lot of overlap with previous sections, specifically the “Direct Export – Report” section on what updates are required to benefit from updated API. They are as follows: Controlling the number of versions of SPO and OneDrive documents added to the review set via the new cloudAttachmentVersion and documentVersion parameters Enabling organizations to trigger the advanced indexing of partial indexed items during the add to review set job via new values added to existing parameters However there are some nuances to the parameter names and the values for this specific API call compared to the exportReport API call. For example, with this API call we use the additionalDataOptions parameter opposed to the additionalOptions parameter. As with the exportReport and exportResult APIs, there are new parameters to control the number of versions of SPO and OneDrive documents added to the review set are as follows: cloudAttachmentVersion: The versions of cloud attachments to include in messages ( e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when a cloud attachment is contained within a email, teams or viva engage messages. If version shared is configured this is also always returned. documentVersion: The versions of files in SharePoint to include (e.g. latest, latest 10, latest 100 or All). This controls how many versions of a file that is collected when targeting a SharePoint or OneDrive site directly in the search. Historically for this API call, within the additionalDataOptions parameter you could include the allVersions value to trigger the collection of all versions of any file stored in SharePoint and OneDrive. With the APIs now natively working with the modern experience for eDiscovery, the allVersions value can still be included in the additionalDataOptions parameter but it will only apply to files collected directly from a SharePoint or OneDrive site. It will not influence any cloud attachments included in email, teams and viva engage messages. To collect additional versions of cloud attachments use the cloudAttachmentVersion parameter to control the number of versions that are included. Also consider moving from using the allDocumentVersions value in the additionalDataOptions parameter and switch to using the new documentVersion parameter. To benefit from advanced indexing in the modern experience for eDiscovery, you must trigger advanced indexing as part of the add to review set job. Within the portal to include partially indexed items and run advanced indexing you would make the following selections. To achieve this via the API call we need to ensure we include the following parameters and values into the request body of the API call. Parameter Value Option from the portal additionalDataOptions advancedIndexing Perform advanced indexing on partially indexed items itemsToInclude searchHits, partiallyIndexed Indexed items that match your search query and partially indexed items additionalDataOptions locationsWithoutHits Exclude partially indexed items in locations without search hits. Historically the API call didn’t support the add to review set job options to convert Teams, Viva Engage and Copilot interactions into HTML transcripts and collect up to 12 hours of related conversations when a message matches a search. With the APIs now natively working with the modern experience for eDiscovery this is now possible by adding support for the htmlTranscripts and messageConversationExpansion values to the addtionalDataOptions parameter. As an example, let’s look at the following add to review set job from the portal and use the Microsoft Graph PowerShell module to invoke the addToReviewSet API call with the updated request body. $commitParams = @{ search = @{ id = $searchID } additionalDataOptions = "linkedFiles,advancedIndexing,htmlTranscripts,messageConversationExpansion,locationsWithoutHits" cloudAttachmentVersion = "latest" documentVersion = "latest" itemsToInclude = "searchHits,partiallyIndexed" } $params = $commitParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/addToReviewSet" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Export from Review Set ediscoveryReviewSet: export This API call enables you to export items from a Review Set within an eDiscovery case. Historically with this API, the exportStructure parameter was used to control the desired export format. Two values could be used and they were directory and pst. This parameter has had been updated to include a new value of msg. Note: The directory value is depreciated in the new UX but remains available in v1.0 of the API call for backwards compatibility. Going forward you should use msg alongside the new exportOptions values. The exportStructure parameter will only control how email items are saved, either within PST files or as individual .msg files. With the APIs now natively working with the modern experience for eDiscovery; we need to account for the additional flexibility customers have to control the structure of their export package. An example of the options available in the direct export job can be seen below. As with the exportResults API call for direct export, new values have been added to the exportOptions parameter for this API call. The new values added to the exportOptions parameters are as follows. They control the export package structure for both Exchange and SharePoint/OneDrive items. Property Option from Portal includeFolderAndPath Organize data from different locations into separate folders or PSTs splitSource Include folder and path of the source condensePaths Condense paths to fit within 259 characters limit friendlyName Give each item a friendly name Organizations are free to mix and match which export options they include in the request body to meet their own organizational requirements. To receive an equivalent output structure when previously using the pst value in the exportStructure parameter I would include all of the above values in the exportOptions parameter within the request body. An example using the Microsoft Graph PowerShell module can be found below. $exportName = "ReviewSetExport - PST" $exportParams = @{ outputName = $exportName description = "Exporting all items from the review set" exportOptions = "originalFiles,includeFolderAndPath,splitSource,condensePaths,friendlyName" exportStructure = "pst" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/export" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params To receive an equivalent output structure when previously using the directory value in the exportStructure parameter I would instead use the msg value within the request body. As the condensed directory structure format export all items into a single folder, all named based on uniquely assigned identifier I do not need to include the new values added to the exportOptions parameter. An example using the Microsoft Graph PowerShell module can be found below. An example using the Microsoft Graph PowerShell module can be found below. $exportName = "ReviewSetExport - MSG" $exportParams = @{ outputName = $exportName description = "Exporting all items from the review set" exportOptions = "originalFiles" exportStructure = "msg" } $params = $exportParams | ConvertTo-Json -Depth 10 $uri = "https://graph.microsoft.com/v1.0/security/cases/ediscoveryCases/$caseID/reviewSets/$reviewSetID/export" Invoke-MgGraphRequest -Method Post -Uri $uri -Body $params Continuing to use the directory value in exportStructure will produce the same output as if msg was used. Wrap Up Thank you for your time reading through this post. Hopefully you are now equipped with the information needed to make the most of the new modern experience for eDiscovery when making your Graph API calls.Elevating Trust in Data through Data Quality in the AI Era
Data collection and utilization are growing rapidly, and organizations are increasingly relying on data as they transition into the era of AI. However, many face significant challenges in effectively managing investments across cloud, data, and AI. This is largely due to a lack of visibility across their entire data estate—which is often fragmented across silos, heterogeneous systems, on-premises environments, and the cloud. Concerns about data trustworthiness, AI-readiness, and uncertainty around security and compliance further complicate the ability to drive timely business insights. Elevating data quality means going beyond merely identifying issues—it's about equipping data stewards, analysts, and engineers with the right tools to proactively improve trust, consistency, and readiness of data for AI, analytics, and operations. Despite the critical importance of data quality, many organizations struggle to activate the full value of their data estate. Research shows that 75% of companies do not have a formal data quality program. This is alarming, especially as data quality has become a cornerstone of successful AI initiatives. Simply put: your AI is only as good as your data. In the past, when humans were always in the loop, minor data quality issues could be corrected manually. But in today’s world—where AI interprets not just the structure but the content of the data—any inconsistencies, inaccuracies, or noncompliance in the data can directly lead to flawed insights, unreliable AI outputs, and poor business decisions. Bad data can make your AI wrong. It can make your BI reports misleading. And it can impact your organization's credibility—as well as your reputation as a data professional. After all, there’s nothing worse than building something that no one trusts or uses. That’s why defining and deploying a robust data quality framework is more critical now than ever before. Organizations must establish a data quality maturity model, track quality across their data estate, and take continuous improvement and remediation actions. Key steps to maintaining data quality and ensuring the health of your data estate include: Define the scope – Identify which data is needed for specific business use cases. Measure quality – Assess whether data meets expected standards. Analyze findings – Understand patterns, gaps, and root causes. Improve quality – Take corrective actions to meet business needs. Control quality – Continuously monitor and govern data to maintain high standards. To ensure your data is fit for purpose, it's essential to establish strong data quality practices within your organization, supported by the right tools, roles, and governance structures. Profile your data to understand the distribution Data profiling is the process of analyzing data to understand its structure, content, and quality. It helps uncover patterns, anomalies, missing values, duplicates, and data types—providing valuable insight into the trustworthiness and usability of data for analytics, decision-making, and quality improvement. By examining datasets for structure, relationships, and inconsistencies, data practitioners can identify potential issues and define validation rules for ongoing data quality assurance. Microsoft Purview Unified Data Catalog provides an integrated data quality experience that supports profiling as a foundational step. With Purview, users can profile data to understand its distribution, patterns, and data types—helping inform data quality programs and define rules for continuous monitoring and improvement. Purview's data profiling leverages AI to recommend which columns in a dataset are most critical and should be profiled. Users remain in control and can adjust these recommendations by deselecting suggested columns or adding others. Additionally, all profiling history is preserved, allowing users to compare current profiles with historical patterns to detect changes over time and continuously assess data health. Define and apply rules to validate your data Applying rules and constraints is essential to ensure that data conforms to predefined requirements or business logic—such as data type, completeness, uniqueness, and consistency. Data profiling results can be leveraged to define these rules and validate data continuously, helping ensure that it is trustworthy and ready for both business and AI use cases. To achieve this, data quality should be measured across all stages: data at creation, data in motion, and data at rest. While many CRM and web-based applications perform UI-level validations to check user inputs before submission, a significant amount of poor-quality data still enters systems through bulk upload processes. These low-quality records often bypass front-end checks and propagate downstream through the data supply chain, leading to broader data integrity issues. In Medallion architecture, you can validate and correct data directly in the pipeline. Bad-quality data detected in the bronze layer can trigger notifications to upstream systems to fix issues at the source. Purview Unified Catalog Data Quality capability provides a user-friendly UI for managing data quality rules. You can configure rules for any supported data sources in cloud or on-premises or datasets in Fabric Lakehouse's bronze layer, schedule DQ jobs, and send notifications to data engineers and stewards when issues arise. This proactive monitoring ensures data quality is addressed early—before data progresses through the silver and gold layers of your architecture. Visualizing Data Quality Metrics and Driving Action Visualizing data quality metrics and trends provides critical insights into the overall health of your data and supports informed, data-driven decision-making. Microsoft Purview Unified Catalog publishes all metadata—including data quality rules and scores—into Fabric OneLake. Data analysts can link Purview metadata with raw data to generate actionable insights. They can also leverage Fabric AI skills to enhance intelligence and integrate with Data Activator to trigger notifications for upstream data publishers and downstream consumers. Alerts and notifications can be configured directly in the Purview Unified Catalog. Data stewards can set thresholds for one or many data assets to automatically notify upstream and downstream contacts if those thresholds are breached. Notifications can be directed to specific individuals or groups (e.g., a support team). These alerts empower data providers to resolve issues at the source, and data engineers can address problems within the bronze layer of their analytical storage—such as in Microsoft Fabric. Additionally, alerts can be used to pause data movement from the bronze to the silver and gold layers, ensuring only high-quality data flows downstream. Users can configure the storage location in the Purview Data Quality solution to publish failed or error records. This allows data stewards and data engineers to review and fix issues, improving data quality before using it for analytics or as input for ML model training. Integrated Data Observability with Data Quality Scores Data observability in Microsoft Purview Unified Catalog offers a comprehensive, bird’s-eye view into the health of the data estate as data flows across various sources. Data stewards, domain experts, and those responsible for data health can monitor their entire data landscape from a single unified interface. This centralized view provides visibility into data lineage—from source to consumption—and reveals how data assets map to governance domains. It enables users to understand where data originates and terminates, pinpoint data quality issues, and assess the impact on reporting and compliance obligations. By consolidating metadata into a single, accessible location, users can explore how data quality is evolving, track usage patterns, and understand who is interacting with the data. With full visibility across the data estate, both central and federated data teams can efficiently identify opportunities to improve metadata quality, clarify ownership, enhance data quality, and optimize data architecture. Summary Defining and implementing a data quality framework has become more critical than ever. Organizations must establish a data quality maturity model and continuously monitor the health of their data estate to enable ongoing improvement and remediation. Microsoft Purview Unified Catalog empowers governance domain owners and data stewards to assess and manage data quality across the ecosystem, driving informed and targeted improvements. In today’s AI-driven world, the trustworthiness of data is directly linked to the effectiveness of AI insights and recommendations. Unreliable data not only weakens AI outcomes but can also diminish trust in the systems themselves and slow their adoption. Poor data quality or inconsistent data structures can disrupt business operations and impair decision-making. Microsoft Purview addresses these challenges with a no-code/low-code approach to defining data quality rules—including out-of-the-box (OOB) and AI-generated rules. These rules are applied at the column level and rolled up into scores at the data asset, data product, and governance domain levels, offering full visibility into data quality across the enterprise. Purview’s AI-powered data profiling capabilities intelligently recommend which columns to profile, while still allowing human review and refinement. This human-in-the-loop process not only improves the relevance and accuracy of profiling but also feeds back into improving AI model performance over time. Elevating data quality is more than identifying problems—it's about equipping stewards, analysts, and engineers with the tools to proactively build trust, ensure consistency, and prepare data for AI, analytics, and business success.Unlocking the Power of Microsoft Purview for ChatGPT Enterprise
In today's rapidly evolving technology landscape, data security and compliance are key. Microsoft Purview offers a robust solution for managing and securing interactions with AI based solutions. This integration not only enhances data governance but also ensures that sensitive information is handled with the appropriate controls. Let's dive into the benefits of this integration and outline the steps to integrate with ChatGPT Enterprise in specific. The integration works for Entra connected users on the ChatGPT workspace, if you have needs that goes beyond this, please tell us why and how it impacts you. Important update 1: Effective May 1, these capabilities require you to enable pay-as-you-go billing in your organization. Important update 2: From May 19, you are required to create a collection policy to ingest ChatGPT Enterprise information. In DSPM for AI you will find this one click process. Benefits of Integrating ChatGPT Enterprise with Microsoft Purview Enhanced Data Security: By integrating ChatGPT Enterprise with Microsoft Purview, organizations can ensure that interactions are securely captured and stored within their Microsoft 365 tenant. This includes user text prompts and AI app text responses, providing a comprehensive record of communications. Compliance and Governance: Microsoft Purview offers a range of compliance solutions, including Insider Risk Management, eDiscovery, Communication Compliance, and Data Lifecycle & Records Management. These tools help organizations meet regulatory requirements and manage data effectively. Customizable Detection: The integration allows for the detection of built in can custom classifiers for sensitive information, which can be customized to meet the specific needs of the organization. To help ensures that sensitive data is identified and protected. The audit data streams into Advanced Hunting and the Unified Audit events that can generate visualisations of trends and other insights. Seamless Integration: The ChatGPT Enterprise integration uses the Purview API to push data into Compliant Storage, ensuring that external data sources cannot access and push data directly. This provides an additional layer of security and control. Step-by-Step Guide to Setting Up the Integration 1. Get Object ID for the Purview account in Your Tenant: Go to portal.azure.com and search for "Microsoft Purview" in the search bar. Click on "Microsoft Purview accounts" from the search results. Select the Purview account you are using and copy the account name. Go to portal.azure.com and search for “Enterprise" in the search bar. Click on Enterprise applications. Remove the filter for Enterprise Applications Select All applications under manage, search for the name and copy the Object ID. 2. Assign Graph API Roles to Your Managed Identity Application: Assign Purview API roles to your managed identity application by connecting to MS Graph utilizing Cloud Shell in the Azure portal. Open a PowerShell window in portal.azure.com and run the command Connect-MgGraph. Authenticate and sign in to your account. Run the following cmdlet to get the ServicePrincipal ID for your organization for the Purview API app. (Get-MgServicePrincipal -Filter "AppId eq '9ec59623-ce40-4dc8-a635-ed0275b5d58a'").id This command provides the permission of Purview.ProcessConversationMessages.All to the Microsoft Purview Account allowing classification processing. Update the ObjectId to the one retrieved in step 1 for command and body parameter. Update the ResourceId to the ServicePrincipal ID retrieved in the last step. $bodyParam= @{ "PrincipalId"= "{ObjectID}" "ResourceId" = "{ResourceId}" "AppRoleId" = "{a4543e1f-6e5d-4ec9-a54a-f3b8c156163f}" } New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId '{ObjectId}' -BodyParameter $bodyParam It will look something like this from the command line We also need to add the permission for the application to read the user accounts to correctly map the ChatGPT Enterprise user with Entra accounts. First run the following command to get the ServicePrincipal ID for your organization for the GRAPH app. (Get-MgServicePrincipal -Filter "AppId eq '00000003-0000-0000-c000-000000000000'").id The following step adds the permission User.Read.All to the Purview application. Update the ObjectId with the one retrieved in step 1. Update the ResourceId with the ServicePrincipal ID retrieved in the last step. $bodyParam= @{ "PrincipalId"= "{ObjectID}" "ResourceId" = "{ResourceId}" "AppRoleId" = "{df021288-bdef-4463-88db-98f22de89214}" } New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId '{ObjectId}' -BodyParameter $bodyParam 3. Store the ChatGPT Enterprise API Key in Key Vault The steps for setting up Key vault integration for Data Map can be found here Create and manage credentials for scans in the Microsoft Purview Data Map | Microsoft Learn When setup you will see something like this in Key vault. 4. Integrate ChatGPT Enterprise Workspace to Purview: Create a new data source in Purview Data Map that connects to the ChatGPT Enterprise workspace. Go to purview.microsoft.com and select Data Map, search if you do not see it on the first screen. Select Data sources Select Register Search for ChatGPT Enterprise and select Provide your ChatGPT Enterprise ID Create the first scan by selecting Table view and filter on ChatGPT Add your key vault credentials to the scan Test the connection and once complete click continue When you click continue the following screen will show up, if everything is ok click Save and run. Validate the progress by clicking on the name, completion of the first full scan may take an extended period of time. Depending on size it may take more than 24h to complete. If you click on the scan name you expand to all the runs for that scan. When the scan completes you can start to make use of the DSPM for AI experience to review interactions with ChatGPT Enterprise. The mapping to the users is based on the ChatGPT Enterprise connection to Entra, with prompts and responses stored in the user's mailbox. 5. Review and Monitor Data: Please see this article for required permissions and guidance around Microsoft Purview Data Security Posture Management (DSPM) for AI, Microsoft Purview data security and compliance protections for Microsoft 365 Copilot and other generative AI apps | Microsoft Learn Use Purview DSPM for AI analytics and Activity Explorer to review interactions and classifications. You can expand on prompts and responses in ChatGPT Enterprise 6. Microsoft Purview Communication Compliance Communication Compliance (here after CC) is a feature of Microsoft Purview that allows you to monitor and detect inappropriate or risky interactions with ChatGPT Enterprise. You can monitor and detect requests and responses that are inappropriate based on ML models, regular Sensitive Information Types, and other classifiers in Purview. This can help you identify Jailbreak and Prompt injection attacks and flag them to IRM and for case management. Detailed steps to configure CC policies and supported configurations can be found here. 7. Microsoft Purview Insider Risk Management We believe that Microsoft Purview Insider Risk Management (here after IRM) can serve a key role in protecting your AI workloads long term. With its adaptive protection capabilities, IRM dynamically adjusts user access based on evolving risk levels. In the event of heightened risk, IRM can enforce Data Loss Prevention (DLP) policies on sensitive content, apply tailored Entra Conditional Access policies, and initiate other necessary actions to effectively mitigate potential risks. This strategic approach will help you to apply more stringent policies where it matters avoiding a boil the ocean approach to allow your team to get started using AI. To get started use the signals that are available to you including CC signals to raise IRM tickets and enforce adaptive protection. You should create your own custom IRM policy for this. Do include Defender signals as well. Based on elevated risk you may select to block users from accessing certain assets such as ChatGPT Enterprise. Please see this article for more detail Block access for users with elevated insider risk - Microsoft Entra ID | Microsoft Learn. 8. eDiscovery eDiscovery of AI interactions is crucial for legal compliance, transparency, accountability, risk management, and data privacy protection. Many industries must preserve and discover electronic communications and interactions to meet regulatory requirements. Including AI interactions in eDiscovery ensures organizations comply with these obligations and preserves relevant evidence for litigation. This process also helps maintain trust by enabling the review of AI decisions and actions, demonstrating due diligence to regulators. Microsoft Purview eDiscovery solutions | Microsoft Learn 9. Data Lifecycle Management Microsoft Purview offers robust solutions to manage AI data from creation to deletion, including classification, retention, and secure disposal. This ensures that AI interactions are preserved and retrievable for audits, litigation, and compliance purposes. Please see this article for more information Automatically retain or delete content by using retention policies | Microsoft Learn. Closing By following these steps, organizations can leverage the full potential of Microsoft Purview to enhance the security and compliance of their ChatGPT Enterprise interactions. This integration not only provides peace of mind but also empowers organizations to manage their data more effectively. We are still in preview some of the features listed are not fully integrated, please reach out to us if you have any questions or if you have additional requirements.Upcoming changes to Microsoft Purview eDiscovery
Today, we are announcing three significant updates to the Microsoft Purview eDiscovery products and services. These updates reinforce our commitment to meeting and exceeding the data security, privacy, and compliance requirements of our customers. To improve security and help protect customers and their data, we have accelerated the timeline for the below changes, which will be enforced by default on May 26. The following features will be retired from the Microsoft Purview portal: Content Search will transition to the new unified Purview eDiscovery experience. The eDiscovery (Standard) classic experience will transition to the new unified Purview eDiscovery experience. The eDiscovery export PowerShell cmdlet parameters will be retired. These updates aim to unify and simplify the eDiscovery user experience in the new Microsoft Purview Portal, while preserving the accessibility and integrity of existing eDiscovery cases. Content Search transition to the new unified Purview eDiscovery experience The classic eDiscovery Content Search solution will be streamlined into the new unified Purview eDiscovery experience. Effective May 26 th , the Content Search solution will no longer be available in the classic Purview portal. Content Search provides administrators with the ability to create compliance searches to investigate data located in Microsoft 365. We hear from customers that the Content Seach tool is used to investigate data privacy concerns, perform legal or incident investigations, validate data classifications, etc. Currently, each compliance search created in the Content Search tool is created outside of the boundaries of a Purview eDiscovery (Standard) case. This means that administrators in Purview Role Groups containing the Compliance Search role can view all Content Searches in their tenant. While the Content Search solution does not enable any additional search permission access, the view of all Content Searches in a customer tenant is not an ideal architecture. Alternatively, when using a Purview eDiscovery case, these administrators only have access to cases in which they are assigned. Customers can now create their new compliance searches within an eDiscovery case using the new unified Purview eDiscovery experience. All content searches in a tenant created prior to May 26, 2025 are now accessible in the new unified Purview eDiscovery experience within a case titled “Content Search”. Although the permissions remain consistent, eDiscovery managers and those with custom permissions will now only be able to view searches from within the eDiscovery cases in which they are assigned, including the “Content Search” case. eDiscovery Standard transition to the new unified Purview eDiscovery experience The classic Purview eDiscovery (Standard) solution experience has transitioned into the new unified Purview eDiscovery experience. Effective May 26 th , the classic Purview eDiscovery (Standard) solution will no longer be available to customers within the classic Purview portal. All existing eDiscovery cases created in the classic purview experience are now available within the new unified Purview eDiscovery experience. Retirement of eDiscovery Export PowerShell Cmdlet parameters The Export parameter within the ComplianceSearchAction eDiscovery PowerShell cmdlets will be retired on May 26, 2025: New-ComplianceSearchAction -Export parameter (and parameters dependent on export such as Report, Retentionreport …) Get-ComplianceSearchAction -Export parameter Set-ComplianceSearchAction -ChangeExportKey parameter We recognize that the removal of the Export parameter may require adjustments to your current workflow process when using Purview eDiscovery (Standard). The remaining Purview eDiscovery PowerShell cmdlets will continue to be supported after May 26 th , 2025: Create and update Compliance Cases New-ComplianceCase, Set-ComplianceCase Create and update Case Holds New-CaseHoldPolicy, Set-CaseHoldPolicy, New-CaseHoldRule, Set-CaseHoldRule Create, update and start Compliance Searches New-ComplianceSearch,Set-ComplianceSearch, Start-ComplianceSearch, Apply Purge action to a Compliance Search New-ComplianceSearchAction -Purge Additionally, if you have a Microsoft 365 E5 license and use eDiscovery (Premium), your organization can script all eDiscovery operations, including export, using the Microsoft Graph eDiscovery APIs. Purview eDiscovery Premium On May 26 th , there will be no changes to the classic Purview eDiscovery (Premium) solution in the classic Purview portal. Cases that were created using the Purview eDiscovery (Premium) classic case experience can also now be accessed in the new unified Purview eDiscovery experience. We recognize that these changes may impact your current processes, and we appreciate your support as we implement these updates. Microsoft runs on trust and protecting your data is our utmost priority. We believe these improvements will provide a more secure and reliable eDiscovery experience. To learn more about the Microsoft Purview eDiscovery solution and become an eDiscovery Ninja, please check out our eDiscovery Ninja Guide at https://aka.ms/eDiscoNinja!What’s New in Microsoft Purview eDiscovery
As organizations continue to navigate increasingly complex compliance and legal landscapes, Microsoft Purview eDiscovery is adapting to address these challenges. New and upcoming enhancements will improve how legal and compliance teams manage cases, streamline workflows, and reduce operational friction. Here’s a look at some newly released features and some others that are coming soon. Enhanced Reporting in Modern eDiscovery The modern eDiscovery experience now offers comprehensive reporting capabilities that capture every action taken within a case. This creates an auditable record of actions, enabling users to manage case activities confidently and accurately. These enhancements are critical for defensibility and audit readiness, and they also help users better understand the outcomes of their searches based on the options and settings they selected. Recent improvements include: Compliance boundary visibility: The Summary.csv report now includes the compliance boundary settings that were applied to the search query. This helps clarify search results, especially when different users have varying access permissions across data locations. Decryption settings: Visibility has been added to the Settings.csv report to indicate if a specific process has Exchange or SharePoint decryption capabilities enabled. This ensures users can verify whether encrypted content can be successfully decrypted during the process of adding to a review set or export. Visibility into Premium feature usage: There is a new value in the Settings.csv report that indicates whether Premium features were enabled for the process. Improved clarity: The Items.csv report now includes an "Added by" column to understand how the item was identified. This column shows if items were included by direct search (IndexedQuery), partial indexing (UnindexedQuery), or advanced indexing (AdvancedIndex). Contextual information: The Summary.csv report helps users understand their exported data by providing contextual explanations. For instance, it explains how the volume of exported data may be greater than the search estimate due to factors such as cloud attachments or multiple versions of SharePoint documents. Copy Search to Hold: Reuse with Confidence Another commonly requested feature that is now available is the ability to create a hold from a search. If your workflow starts with searches before creating holds, you can use the new, “Create a hold” button to create a new hold policy based on that search. Depending on your workflow, it can be effective to start off with a broad, high-level search across the entire data source to quickly surface potentially relevant content and gauge the approximate amount of data that may need review. For example, in a litigation scenario, you may want to search across the entire mailbox and OneDrive of custodians that are of interest. Once the preservation obligation is triggered, this new feature easily allows you to copy your search and create a hold to preserve the content, streamlining your workflow and enhancing efficiency. Whether you begin with a highly targeted search or something broader, this feature reduces duplication of effort and ensures consistency across processes when you want to turn an existing search into a hold. Retry Failed Locations: Easily Address Processing Issues Searches can occasionally encounter issues due to temporarily inaccessible locations. The new “Retry failed locations” feature introduces a simple, automated way to reprocess those issues without restarting the entire job. When you retry the failed locations, the results will be aggregated with the original job to provide a comprehensive overview. This feature is particularly beneficial for administrators, allowing them to efficiently retry the search while ensuring continuity in their workflow. Duplicate Search: Consistency Made Easy Save time and reduce rework by duplicating an existing search with just a few mouse clicks. Whether you're building on a previous query or rerunning it with slight adjustments, this new feature lets you preserve all original parameters, such as conditions, data sources, and locations, so you can move faster with confidence and consistency. Administrators and users simply click the “Duplicate search” button while on an existing search and then rename it. Case-Level Data Source Management: The New Data Sources Tab We recently introduced a new case-level Data Sources view in Purview eDiscovery, allowing data sources to be reused in searches and holds within a case. This enhancement allows case administrators to map and manage all relevant data locations, including mailboxes, SharePoint sites, Teams channels, and more, directly within a case. Once data sources have been added, it greatly simplifies the process of creating searches. Adding data sources within the Data Sources tab enables administrators to select from a list of previously added sources when creating new searches, rather than searching for them each time. This feature is especially useful for frequent searches across the same data sources, which happens often within eDiscovery. With this feature, users can: Visualize all data sources tied to a case. Use data source locations to populate searches and eDiscovery hold policies. Simplify the search creation process for administrators and users. This new functionality empowers teams to make more informed decisions about what to preserve, search, and review. Delete Searches and Search Exports: Keep Your Case Well Organized A newly released feature is one that helps you remove outdated or redundant searches from a case, helping keep the workspace clean and focused. With just a few clicks, you can delete searches that are no longer relevant, helping you stay focused on what matters most without extra clutter. Whether you're tidying up after a project or clearing out test runs, deletion helps keep you organized. hot showing how to delete a search. Another new feature enables users to delete search exports. This functionality is intended to assist with the management of case data and to remove unnecessary or outdated search exports, including sensitive information that is no longer required. Condition Builder Enhancements for Logical Operators (AND, OR, NEAR) The Condition Builder now supports logical operators—AND, OR, and NEAR—all within the same line, or grid. These enhancements empower users to construct more targeted search conditions and offer increased control over the use of phrases, enabling additional options over how terms are combined and matched, and proving greater flexibility to keyword queries. Tenant level control over Premium Features An upcoming enhancement will introduce a tenant-level setting to allow organizations to set the default behavior for new case creation and whether to use Premium features by default. This will provide greater flexibility and control, especially for customers that are in a mixed-license environment. Export Naming and Controlling Size of Exports A couple more upcoming enhancements are intended to give users more control and additional options over the export process, increasing flexibility for legal and compliance teams. First, export packages will include the user-defined export name directly in the download package. This small but impactful change simplifies tracking and association of exported data with specific cases. This is especially useful when managing multiple exports across different cases. Second, a new configuration setting will allow administrators to define a maximum export package size. This gives teams greater control over how data is partitioned, helping to optimize performance and reduce the risk of download issues or browser timeouts during large exports. Takeaways These updates are part of a broader modernization of the Purview eDiscovery experience, which includes a unified user experience, enhanced reporting, and a more streamlined and intuitive workflow for teams managing regulatory inquiries, litigation, or investigations. These enhancements aim to reduce friction, streamline workflows, and accelerate productivity. To learn more about thew new eDiscovery user experience, visit our Microsoft documentation at https://aka.ms/ediscoverydocsnew For more updates on the future of eDiscovery, please check out our product roadmap at https://aka.ms/ediscoveryroadmap To become a Purview eDiscovery Ninja, check out our eDiscovery Ninja Guide at: https://aka.ms/ediscoveryninjaSensitivity Auto-labelling via Document Property
Why is this needed? Sensitivity labels are generally relevant within an organisation only. If a file is labelled within one environment and then moved to another environment, sensitivity label content markings may be visible, but by default, the applied sensitivity label will not be understood. This can lead to scenarios where information that has been generated externally is not adequately protected. My favourite analogy for these scenarios is to consider the parallels between receiving sensitive information and unpacking groceries. When unpacking groceries, you might sit your grocery bag on a counter or on the floor next to the pantry. You’ll likely then unpack each item, take a look at it and then decide where to place it. Without looking at an item to determine its correct location, you might place it in the wrong location. Porridge might be safe from the kids on the bottom shelf. If you place items that need to be protected, such as chocolate, on the bottom shelf, it’s not likely to last very long. So, I affectionately refer to information that hasn’t been evaluated as ‘porridge’, as until it has been checked, it will end up on the bottom shelf of the pantry where it is quite accessible. Label-based security controls, such as Data Loss Prevention (DLP) policies using conditions of ‘content contains sensitivity label’ will not apply to these items. To ensure the security of any contained sensitive information, we should look for potential clues to its sensitivity and then utilize these clues to ensure that the contained information is adequately protected - We take a closer look at the ‘porridge’, determine whether it’s an item that needs protection and if so, move it to a higher shelf in the pantry so that it’s out of reach for the kids. Effective use of Purview revolves around the use of ‘know your data’ strategies. We should be using as many methods as possible to try to determine the sensitivity of items. This can include the use of Sensitive Information Types (SITs) containing keyword or pattern-based classifiers, trainable classifiers, Exact Data Match, Document fingerprinting, etc. Matching items via SITs present in the items content can be problematic due to false positives. Keywords like ‘Sensitive’ or ‘Protected’ may be mentioned out of context, such as when referring to a classification or an environment. When classifications have been stamped via a property, it allows us to match via context rather than content. We don’t need to guess at an item’s sensitivity if another system has already established what the item’s classification is. These methods are much less prone to false positives. Why isn’t everyone doing this? Document properties are often not considered in Purview deployments. SharePoint metadata management seems to be a dying artform and most compliance or security resources completing Purview configurations don’t have this skill set. There’s also a lack of understanding of the relevance of checking for item properties. Microsoft haven’t helped as the documentation in this space is somewhat lacking and needs to be unpicked via some aligning DLP guidance (Create a DLP policy to protect documents with FCI or other properties). Many of these configurations will also be tied to regional requirements. Document properties being used by systems where I’m from, in Australia, will likely be very different to those used in other parts of the world. In the following sections, we’ll take a look at applicable use cases and walk through how to enable these configurations. Scenarios for use Labelling via document property isn’t for everyone. If your organisation is new to classification or you don’t have external partners that you collaborate with at higher sensitivity levels, then this likely isn’t for you. For those that collaborate heavily and have a shared classification framework, as is often seen across government, this is a must! This approach will also be highly relevant to multi-tenant organisations or conglomerates where information is regularly shared between environments. The following scenarios are examples of where this configuration will be relevant: 1. Migrating from 3 rd party classification tools If an item has been previously stamped by a 3 rd party classification tool, then evaluating its applied document properties will provide a clear picture of its security classification. These properties can then be used in service-based auto-labelling policies to effectively transition items from 3 rd party tools to Microsoft Purview sensitivity labels. As labels are applied to items, they will be brought into scope of label-based controls. 2. Detecting data spill Data spill is a term that is used to define situations where information that is of a higher than permitted security classification land in an environment. Consider a Microsoft 365 tenant that is approved for the storage of Official information but Top Secret files are uploaded to it. Document properties that align with higher than permitted classifications provide us with an almost guaranteed method of identifying spilled items. Pairing this document property with an auto-labelling policy allows for the application of encryption to lock unauthorized users out of the items. Tools like Content Explorer and eDiscovery can then be used to easily perform cleanup activities. If using document properties and auto-labelling for this purpose, keep in mind that you’ll need to create sensitivity labels for higher than permitted classifications in order to catch spilled items. These labels won’t impact usability as you won’t publish them to users. You will, however, need to publish them to a single user or break glass account so that they’re not ignored by auto-labelling. 3. Blocking access by AI tools If your organization was concerned about items with certain properties applied being accessed by generative AI tools, such as Copilot, you could use Auto-labelling to apply a sensitivity label that restricts EXTRACT permissions. You can find some information on this at Microsoft 365 Copilot data protection architecture | Microsoft Learn. This should be relevant for spilled data, but might also be useful in situations where there are certain records that have been marked via properties and which should not be Copilot accessible. 4. External Microsoft Purview Configurations Sensitivity labels are relevant internally only. A label, in its raw form, is essentially a piece of metadata with an ID (or GUID) that we stamp on pieces of information. These GUIDs are understood by your tenant only. If an item marked with a GUID shows up in another Microsoft 365 tenant, the GUID won’t correspond with any of that tenant’s labels or label-based controls. The art in Microsoft Purview lies in interpreting the sensitivity of items based on content markings and other identifiers, so that data security can be maintained. Document properties applied by Purview, such as ClassificationContentMarkingHeaderText are not relevant to a specific tenant, which makes them portable. We can use these properties to help maintain classifications as items move between environments. 5. Utilizing metadata applied by Records Management solutions Some EDRMS, Records or Content Management solutions will apply properties to items. If an item has been previously managed and then stamped with properties, potentially including a security classification, via one of these systems, we could use this information to inform sensitivity label application. 6. 3 rd party classification tools used externally Even if your organisation hasn’t been using 3rd party classification tools, you should consider that partner organisations, such as other Government departments, might be. Evaluating the properties applied by external organisations to items that you receive will allow you to extend protections to these items. If classification tools like Janus or Titus are used in your geography/industry, then you may want to consider checking for their properties. Regarding the use of auto-classification tools Some organisations, particularly those in Government, will have organisational policies that prevent the use of automatic classification capabilities. These policies are intended to ensure that each item is assessed by an actual person for risk of disclosure rather than via an automated service that could be prone to error. However, when auto-labelling is used to interpret and honour existing classifications, we are lowering rather than raising the risk profile. If the item’s existing classification (applied via property) is ignored, the item will be treated as porridge and is likely to be at risk. If auto-labelling is able to identify a high-risk item and apply the relevant label, it will then be within scope of Purview’s data security controls, including label-based DLP, groups and sites data out of place alerting, and potentially even item encryption. The outcome is that, through the use of auto-labelling, we are able to significantly reduce risk of inappropriate or unintended disclosure. Configuration Process Setting up document property-based auto-labelling is fairly straightforward. We need to setup a managed property and then utilize it an auto-labelling policy. Below, I've split this process into 6 steps: Step 1 – Prepare your files In order to make use of document properties, an item with the properties applied will first need to be indexed by SharePoint. SharePoint will record the properties as ‘crawled properties’, which we’ll then need to convert into ‘managed properties’ to make them useful. If you already have items with the relevant properties stored in SharePoint, then they are likely already indexed. If not, you’ll need to upload or create an item or items with the properties applied. For testing, you’ll want to create a file with each property/value combination so that you can confirm that your auto-labelling policies are all working correctly. This could require quite a few files depending on the number of properties you’re looking for. To kick off your crawled property generation though, you could create or upload a single file with the correct properties applied. For example: In the above, I’ve created properties for ClassificationContentMarkingHeaderText and ClassificationContentMarkingFooterText, which you’ll often see applied by Purview when an item has a sensitivity label content marking applied to it. I’ve also included properties to help identify items classified via JanusSeal, Titus and Objective. Step 2 – Index the files After creating or uploading your file, we then need SharePoint to index it. This should happen fairly quickly depending on the size of your environment. I'd expect to wait sometime between 10 minutes and 24 hrs. If you're not in a hurry, then I'd recommend just checking back the next day. You'll know when this has been completed when you head into SharePoint Admin > Search > Managed Search Schema > Crawled Properties and can find your newly indexed properties: Step 3 – Configure managed properties Next, the properties need to be configured as managed properties. To do this, go to SharePoint Admin > More features > Search > Managed Search Schema > Managed Properties. Create a new managed property and give it a name. Note that there are some character restrictions in naming, but you should be able to get it close to your document property name. Set the property’s type to text, select queryable and retrievable. Under ‘mappings to crawled properties’, choose add mapping, search for and select the property indexed from the file property. Note that the crawled property will have the same name as your document property, so there’s no need to browse through all of them: Repeat this so that you have a managed property for each document property that you want to look for. Step 4 – Configure Auto-labelling policies Next up, create some auto-labelling policies. You’ll need one for each label that you want to apply, not one per property as you can check multiple properties within the one auto-labelling policy. - From within Purview, head to Information Protection > Policies > Auto-labelling policies. - Create a new policy using the custom policy template. - Give your policy an appropriate name (e.g. Label PROTECTED via property). - Select the label that you want to apply (e.g. PROTECTED). - Select SharePoint based services (SharePoint and OneDrive). - Name your auto-labelling rules appropriately (e.g. SPO – Contains PROTECTED property) - Enter your conditions as a long string with property and value separated via a colon and multiple entries separated with a comma. For example: ClassificationContentMarkingHeaderText:PROTECTED,ClassificationContentMarkingFooterText:PROTECTED,Objective-Classification:PROTECTED,PMDisplay:PROTECTED,TitusSEC:PROTECTED Note that the properties that you are referencing are the Managed Property rather than the document property. This will be relevant if your managed property ended up having a different name due to character restrictions. After pasting in your string into the UI, the resultant rule should look something like this: When done, you can either leave your policy in simulation mode or save it and then turn it on from the auto-labelling policies screen. Just be aware of any potential impacts, such as accidently locking users out by automatically deploying a label with encryption configuration. You can reduce any potential impact by targeting your auto-labelling policy at a site or set of sites initially and then expanding its scope after testing. Step 5 - Test Testing your configuration will be as easy as uploading or creating a set of files with the relevant document properties in place. Once uploaded, you’ll need to give SharePoint some time to index the items and then the auto-labelling policy some time to apply sensitivity labels to them. To confirm label application, you can head to the document library where your test files are located and enable the sensitivity column. Files that have been auto-labelled will have their label listed: You could also check for auto-labelling activity in Purview via Activity explorer: Step 6 – Expand into DLP If you’ve spent the time setting up managed properties, then you really should consider capitalizing on them in your DLP configurations. DLP policy conditions can be configured in the same manner that we configured Auto-labelling in Step 3 above. The document property also gives us an anchor for DLP conditions that is independent of an item’s sensitivity label. You may wish to consider the following: DLP policies blocking external sharing of items with certain properties applied. This might be handy for situations where auto-labelling hasn’t yet labelled an item. DLP policies blocking the external sharing of items where the applied sensitivity label doesn’t match the applied document property. This could provide an indication of risky label downgrade. You could extend such policies into Insider Risk Management (IRM) by creating IRM policies that are aligned with the above DLP policies. This will allow for document properties to be considered in user risk calculation, which can inform controls like Adaptive Protection. Here's an example of a policy from the DLP rule summary screen that shows conditions of item contains a label or one of our configured document properties: Thanks for reading and I hope this article has been of use. If you have any questions or feedback, please feel free to reach out.2.2KViews8likes8CommentsFrom Traditional Security to AI-Driven Cyber Resilience: Microsoft’s Approach to Securing AI
By Chirag Mehta, Vice President and Principal Analyst - Constellation Research AI is changing the way organizations work. It helps teams write code, detect fraud, automate workflows, and make complex decisions faster than ever before. But as AI adoption increases, so do the risks, many of which traditional security tools were not designed to address. Cybersecurity leaders are starting to see that AI security is not just another layer of defense. It is becoming essential to building trust, ensuring resilience, and maintaining business continuity. Earlier this year, after many conversations with CISOs and CIOs, I saw a clear need to bring more attention to this topic. That led to my report on AI Security, which explores how AI-specific vulnerabilities differ from traditional cybersecurity risks and why securing AI systems calls for a more intentional approach. Why AI Changes the Security Landscape AI systems do not behave like traditional software. They learn from data instead of following pre-defined logic. This makes them powerful, but also vulnerable. For example, an AI model can: Misinterpret input in ways that humans cannot easily detect Be tricked into producing harmful or unintended responses through crafted prompts Leak sensitive training data in its outputs Take actions that go against business policies or legal requirements These are not coding flaws. They are risks that originate from how AI systems process information and act on it. These risks become more serious with agentic AI. These systems act on behalf of humans, interact with other software, and sometimes with other AI agents. They can make decisions, initiate actions, and change configurations. If one is compromised, the consequences can spread quickly. A key challenge is that many organizations still rely on traditional defenses to secure AI systems. While those tools remain necessary, they are no longer enough. AI introduces new risks across every layer of the stack, including data, networks, endpoints, applications, and cloud infrastructure. As I explained in my report, the security focus must shift from defending the perimeter to governing the behavior of AI systems, the data they use, and the decisions they make. The Shift Toward AI-Aware Cyber Resilience Cyber resilience is the ability to withstand, adapt to, and recover from attacks. Meeting that standard today requires understanding how AI is developed, deployed, and used by employees, customers, and partners. To get there, organizations must answer questions such as: Where is our sensitive data going, and is it being used safely to train models? What non-human identities, such as AI agents, are accessing systems and data? Can we detect when an AI system is being misused or manipulated? Are we in compliance with new AI regulations and data usage rules? Let’s look at how Microsoft has evolved its mature security portfolio to help protect AI workloads and support this shift toward resilience. Microsoft’s Approach to Secure AI Microsoft has taken a holistic and integrated approach to AI security. Rather than creating entirely new tools, it is extending existing products already used by millions to support AI workloads. These features span identity, data, endpoint, and cloud protection. 1. Microsoft Defender: Treating AI Workloads as Endpoints AI models and applications are emerging as a new class of infrastructure that needs visibility and protection. Defender for Cloud secures AI workloads across Azure and other cloud platforms such as AWS and GCP by monitoring model deployments and detecting vulnerabilities. Defender for Cloud Apps extends protection to AI-enabled apps running at the edge Defender for APIs supports AI systems that use APIs, which are often exposed to risks such as prompt injection or model manipulation Additionally, Microsoft has launched tools to support AI red-teaming, content safety, and continuous evaluation capabilities to ensure agents operate safely and as intended. This allows teams identify and remediate risks such as jailbreaks or prompt injection before models are deployed. 2. Microsoft Entra: Managing Non-Human Identities As organizations roll out more AI agents and copilots, non-human identities are becoming more common. These digital identities need strong oversight. Microsoft Entra helps create and manage identities for AI agents Conditional Access ensures AI agents only access the resources they need, based on real-time signals and context Privileged Identity Management manages, controls, and monitors AI agents access to important resources within an organization 3. Microsoft Purview: Securing Data Used in AI Purview plays an important role in securing both the data that powers AI apps and agents, and the data they generate through interactions. Data discovery and classification helps label sensitive information and track its use Data Loss Prevention policies help prevent leaks or misuse of data in tools such as Copilot or agents built in Azure AI Foundry Insider Risk Management alerts security teams when employees feed sensitive data into AI systems without approval Purview also helps organizations meet transparency and compliance requirements, extending the same policies they already use today to AI workloads, without requiring separate configurations, as regulations like the EU AI Act take effect. Here's a video that explains the above Microsoft security products: Securing AI Is Now a Strategic Priority AI is evolving quickly, and the risks are evolving with it. Traditional tools still matter, but they were not built for systems that learn, adapt, and act independently. They also weren’t designed for the pace and development approaches AI requires, where securing from the first line of code is critical to staying protected at scale. Microsoft is adapting its security portfolio to meet this shift. By strengthening identity, data, and endpoint protections, it is helping customers build a more resilient foundation. Whether you are launching your first AI-powered tool or managing dozens of agents across your organization, the priority is clear. Secure your AI systems before they become a point of weakness. You can read more in my AI Security report and learn how Microsoft is helping organizations secure AI supporting these efforts across its security portfolio.