Blog Post

Microsoft Security Blog
7 MIN READ

Protecting sensitive information in the era of AI with Microsoft Purview Information Protection

chmcconnell's avatar
chmcconnell
Icon for Microsoft rankMicrosoft
Mar 24, 2025

In today’s rapidly evolving digital landscape, organizations face increasing challenges in protecting large volumes of sensitive data. As businesses increasingly adopt AI technologies, the volume of data generated and processed is growing at an unprecedented rate. This rapid data growth, coupled with the modern workplace’s demand for accessing information from various devices and locations, necessitates robust data protection measures. At the same time, security must not come at the cost of productivity. With AI becoming integral to modern workflows, organizations need the right guardrails to ensure employees can harness the power of AI for greater efficiency while ensuring data remains protected.

At the same time, security must not come at the cost of productivity. With AI becoming integral to modern workflows, organizations need the right guardrails to ensure employees can harness the power of AI for greater efficiency while ensuring data remains protected.

To help organizations navigate these challenges, Microsoft Purview Information Protection continues to advance its capabilities, enabling organizations to discover, classify, label, and protect sensitive information not only within Microsoft 365, but also across select non-Microsoft 365 data sources. In this blog, we will highlight the new enhancements and capabilities that make it easier to secure sensitive data, provide visibility, and enforce compliance policies.

Expanding data classification and protection capabilities

The global average cost of a data breach increased 10% in one year, reaching $4.9 million [1], underscoring the growing urgency of data protection. As organizations generate and store vast amounts of information, much of it remains untouched—whether kept for business continuity, historical reference, or regularity compliance. However, without proper protection, this data can leave organizations susceptible to hidden data risks, including data misuses and leaks. To address this challenge, we are thrilled to announce the public preview of on-demand classification for SharePoint and OneDrive, starting in April.

On-demand classification expands the scope of data protection by scanning files that have not been classified or modified for a long time. Once classified, customers can automatically apply the relevant sensitivity label based on their organization's labeling policies. This ensures that all files, regardless of when the file was last modified or accessed, are protected and compliant with organizational policies. This makes it easier for organizations to manage and protect their large volumes of data on SharePoint and OneDrive. It not only improves data security but also enhances the overall data security posture by ensuring continuous compliance and effective risk management.

Administrators can scope on-demand classification scans to specific SharePoint sites or OneDrive accounts and can select files by the last modified time. For example, an organization might focus on scanning files in a SharePoint site dedicated to financial records, which are considered high risk due to the sensitive nature of the data.

The results of classification integrate with other Microsoft Purview solutions, such as Insider Risk Management (IRM) and Data Loss Prevention (DLP), to provide robust protection. For example, a DLP policy for financial information can automatically detect and block the sharing of a classified document containing sensitive financial data, preventing potential leaks. This expansion ensures that the benefits of classification, and the related DLP and IRM policies, are applicable to all data, strengthening overall data security posture.

One major challenge in maintaining a strong data security posture is data oversharing, especially in AI-driven environments. When data is unclassified, mislabeled or outdated, it can be exposed in unintended ways, increasing the risk of unauthorized access. To address this, Microsoft’s Data Security Posture Management (DSPM) for AI, announced last year, includes an oversharing assessment that gives administrators greater visibility and control.

Building on this capability, the new on-demand classification allows administrators to initiate a classification scan directly from the oversharing assessment in DSPM for AI. This ensures that older or previously unscanned files are classified according to the latest data protection policies. Additionally, it helps Microsoft 365 Copilot index and ground data more accurately, ensuring AI-driven outputs remain secure. By providing a more comprehensive view of all data, on-demand classification helps organizations proactively manage risk, making AI copilots even safer.

On-demand classification will be offered with a pay-as-you-go pricing model, allowing organizations to scale their data protection efforts according to their needs. Before you trigger an on-demand classification scan, you can estimate the cost and fine-tune the scope as many times as needed to get a better understanding of the potential cost you would incur based on your organization’s needs. 

Figure 1: On-demand classification scan results

Automate data security at scale   

Organizations managing large-scale data in Azure Storage face challenges in consistently enforcing security and compliance policies across their data estate. To address this, Microsoft Purview protection policies for Azure SQL, Data Lake, and Blob Storage are now in public preview, enabling administrators to define and automatically apply protection policies based on sensitivity label of assets. This helps ensure consistent enforcement of access controls, sensitivity labeling, and data classification at scale. Learn more in this blog. 

Figure 2: Information Protection policies for Azure SQL, Data Lake, and Blob Storage

Notable optical character recognition (OCR) enhancements 

Optical character recognition (OCR) enables Microsoft Purview to scan images for sensitive information. Examples include screenshots of sensitive documents, scanned forms, and pictures of proprietary data like personal IDs or credit cards.  

 We are happy to share that, in addition to the ability to scan standalone images in EXO, which is generally available, support for embedded images is now available in EXO in public preview. This enhancement now allows for the detection of sensitive information within images embedded in attachments or documents in emails, including screenshots of confidential documents, scanned forms, and photos containing proprietary data shared in Office or archive files in EXO. This provides administrators with greater visibility into sensitive information that may be hidden within embedded images in emails and attachments, ensuring that all data is properly classified and protected. 

Along with that, the OCR cost estimator for MacOS is now generally available. OCR cost estimator helps organizations predict and manage costs by providing a clear estimate of images by location for Exchange, Teams, SharePoint, OneDrive, and endpoints. Customers can try the OCR cost estimator for free for 30 days. 

Once you select “Try for free,” you will have 30 days to run estimates through the OCR cost estimator and configure settings based on the needs and budget of your organization. It can be run without setting up an Azure subscription, making it accessible to all organizations.  

Figure 3: Cost estimation report for OCR by location

Strengthening document protection with dynamic watermarking 

We announced dynamic watermarking in Word, Excel, and PowerPoint last year and we’re happy to share that it’s now generally available. This capability is designed to deter users from leaking sensitive information and to attribute leaks if they do occur. When an admin enables the dynamic watermarking setting for a protected sensitivity label, files with that sensitivity label will render with dynamic watermarks when opened in Word, Excel, and PowerPoint. These dynamic watermarks contain the User Principal Name (UPN), usually email address, associated with the account being used to open the file, allowing for leaks to be tracked back to specific users. Learn more about dynamic watermarking, how it works, and how to configure it within a sensitivity label in our documentation. 

Figure4: Word file with dynamic watermarks

Enhanced audit logs for auto-labeling in SharePoint 

Auto-labeling in Microsoft Purview Information Protection automatically labels an organization’s most sensitive content to reduce the need for manual user labeling. It can label data at rest across SharePoint and OneDrive up to 100k files per day.  

 Ensuring consistent and accurate labeling of sensitive information can be challenging without clear insights into the labeling process. To address this issue, starting this month, we will provide more detailed information on why a file is labeled, including policy and rule match information on SharePoint. This enhancement will enable SharePoint to send back information on the policy and rule matches that triggered the auto-labeling of files.  

This added transparency simplifies the task for administrators, enabling them to review and refine their labeling policies more effectively. As a result, sensitive information will be more consistently and accurately labeled in accordance with organizational standards.  

Get started 

You can try Microsoft Purview Information Protection and other Microsoft Purview solutions directly within the Microsoft Purview portal with a free trial. * 

  • Mechanics video on how to automatically classify and protect documents and data 

And, lastly, join the Microsoft Purview Information Protection Customer Connection Program (CCP) to get information and access to upcoming capabilities in private previews in Microsoft Purview Information Protection. An active NDA is required. Click here to join. 

Learn more about the innovations designed to help your organization protect data, defend against cyber threats, and stay compliant. Join Microsoft leaders online at Microsoft Secure on April 9.

Licensing details 

On-demand classification 

An E5, E5 Compliance, or E5 Information Protection and Governance license is required. Pricing is based on the number of files, at $20 per 10,000 assets scanned.

OCR embedded in EXO 

An Azure subscription and M365 E3 or E5 license are required. Pricing is based on the number of images scanned, at $1.00 per 1,000 images scanned. Each scanned image is counted as a single transaction. For more details, see here.   

OCR cost estimator for macOS 

The cost estimator is available at no cost for 30 days. After this period, generating new estimates will be disabled. However, the insights gained during the 30 days should provide enough data to understand usage patterns and estimate potential monthly costs. Learn more about cost estimator here 

Dynamic watermarking 

Included in E5, E5 Compliance, and E5 Information Protection and Governance licenses. 

Auto-labeling audit enrichments 

Included in E5, E5 Compliance, and E5 Information Protection and Governance licenses. 

 

* Pay-as-you-go capabilities are not available in the free trial. 

 

  1. Cost of a Data Breach Report 2024 | IBM 
Updated Apr 03, 2025
Version 3.0
No CommentsBe the first to comment