Protecting sensitive information in the era of AI with Microsoft Purview Information Protection

Microsoft

Mar 24, 2025

As organizations embrace AI to drive innovation and productivity, the amount of data being created, stored, and accessed is growing faster than ever. But with that growth comes new security challenges. Sensitive data can land in unexpected places, buried in old SharePoint sites or tucked away in OneDrive folders, and if it’s not properly labeled or protected, it can be accidentally exposed by AI tools or human error.

To help address this risk, Microsoft Purview Information Protection continues to develop, making it easier to discover, classify, and protect sensitive information across Microsoft 365 and beyond.

What’s new: On-demand classification for SharePoint and OneDrive

We’re introducing on-demand classification for SharePoint and OneDrive, now in public preview. This new capability lets admins initiate targeted scans of data at rest, meaning content that already exists in the cloud but hasn’t been modified or accessed recently, to apply the latest classifiers and sensitivity labels.

This expands your ability to protect sensitive information by enabling security teams to proactively classify and label existing files at rest—without waiting for user activity to trigger protection—and with full control over what data to scan and when. Admins can prioritize specific SharePoint sites, OneDrive accounts, or files sets based on risk, business needs, or newly introduced classifiers.

When combined with Information Protection’s continuous classification, which automatically reclassifies files whenever they’re created, accessed, or modified, this two-pronged approach helps organizations keep content more closely aligned with the latest security policies:

Continuous classification keeps active files up to date by automatically re-evaluating them when they’re created, accessed, or edited
On-demand classification brings older or inactive files into scope by allowing admins to scan stored data at rest on their own schedule

With on-demand classification, organizations can:

Extend protection to previously unclassified or inactive files, increasing overall coverage
Strengthen data protection across your environment without relying on end-user actions
Reduce the risk of AI tools surfacing unlabeled or unprotected information and do it all natively, without exporting your data or relying on fragmented tools

Why it matters: Expanding coverage and reducing AI oversharing risks

Unlabeled sensitive data is high-risk data. AI tools like Copilot can surface content without understanding whether it should be shared. If a file hasn’t been classified, meaning it hasn’t been evaluated against current classifiers, it won’t be labeled or protected. That increases the risk of accidental exposure.

That’s where on-demand classification and Data Security Posture Management (DSPM) for AI work together to reduce that risk:

DSPM identifies oversharing risks, such as files that contain sensitive information but lack sensitivity labels
Admins can initiate an on-demand scan directly from DSPM to classify data at rest—content that’s been sitting untouched but still poses risk
Scans are fully configurable. You choose the scope (sites, users), filters (e.g., last modified time), and which classifiers to apply.
Once classified, labeled files are automatically protected through policies like Data Loss Prevention (DLP) and Insider Risk Management (IRM) and other policy-based controls

Example: A financial services team has archived quarterly reports in a SharePoint folder. DSPM detects that these files haven’t been labeled, even though they may contain sensitive financial data. An admin initiates an on-demand classification scan scoped to just that site, using updated financial classifiers. Once classified, the appropriate sensitivity labels and relevant policies are applied. This ensures sensitive data stays protected, even if it hasn’t been touched in years.

With on-demand classification, admins aren’t limited to real-time triggers. They get flexible, precise tools to catch what’s been missed and close potential security gaps on their terms.

Built-in integration with other Microsoft Purview solutions

The results of classification integrate with other Microsoft Purview solutions. For example, a DLP policy for financial information can automatically detect and block the sharing of a classified document containing sensitive financial data, mitigating risks of accidental leaks. This expansion ensures that the benefits of classification, and the related security policies, are applicable to all data, strengthening overall data security posture.

Flexible, cost-efficient protection

On-demand classification is offered with a pay-as-you-go pricing model , allowing organizations to scale their data protection efforts according to their needs. Before running a classification scan, admins can estimate the cost to fit their goals and budget.

By providing greater control over data security, on-demand classification helps organizations proactively manage risk, maintain compliance, and strengthen their overall security posture. Learn more about on-demand classification here.

Figure 1: On-demand classification scan results

Automate data security at scale

Organizations managing large-scale data in Azure Storage face challenges in consistently enforcing security and compliance policies across their data estate. To address this, Microsoft Purview protection policies for Azure SQL, Data Lake, and Blob Storage are now in public preview, enabling administrators to define and automatically apply protection policies based on sensitivity label of assets. This helps ensure consistent enforcement of access controls, sensitivity labeling, and data classification at scale. Learn more in this blog.

Figure 2: Information Protection policies for Azure SQL, Data Lake, and Blob Storage

Notable optical character recognition (OCR) enhancements

Optical character recognition (OCR) enables Microsoft Purview to scan images for sensitive information. Examples include screenshots of sensitive documents, scanned forms, and pictures of proprietary data like personal IDs or credit cards.

We are happy to share that, in addition to the ability to scan standalone images in EXO, which is generally available, support for embedded images is now available in EXO in public preview. This enhancement now allows for the detection of sensitive information within images embedded in attachments or documents in emails, including screenshots of confidential documents, scanned forms, and photos containing proprietary data shared in Office or archive files in EXO. This provides administrators with greater visibility into sensitive information that may be hidden within embedded images in emails and attachments, ensuring that all data is properly classified and protected.

Along with that, the OCR cost estimator for MacOS is now generally available. OCR cost estimator helps organizations predict and manage costs by providing a clear estimate of images by location for Exchange, Teams, SharePoint, OneDrive, and endpoints. Customers can try the OCR cost estimator for free for 30 days.

Once you select “Try for free,” you will have 30 days to run estimates through the OCR cost estimator and configure settings based on the needs and budget of your organization. It can be run without setting up an Azure subscription, making it accessible to all organizations.

Figure 3: Cost estimation report for OCR by location

Strengthening document protection with dynamic watermarking

We announced dynamic watermarking in Word, Excel, and PowerPoint last year and we’re happy to share that it’s now generally available. This capability is designed to deter users from leaking sensitive information and to attribute leaks if they do occur. When an admin enables the dynamic watermarking setting for a protected sensitivity label, files with that sensitivity label will render with dynamic watermarks when opened in Word, Excel, and PowerPoint. These dynamic watermarks contain the User Principal Name (UPN), usually email address, associated with the account being used to open the file, allowing for leaks to be tracked back to specific users. Learn more about dynamic watermarking, how it works, and how to configure it within a sensitivity label in our documentation.

Figure 4: Word file with dynamic watermarks

Enhanced audit logs for auto-labeling in SharePoint

Auto-labeling in Microsoft Purview Information Protection automatically labels an organization’s most sensitive content to reduce the need for manual user labeling. It can label data at rest across SharePoint and OneDrive up to 100k files per day.

Ensuring consistent and accurate labeling of sensitive information can be challenging without clear insights into the labeling process. To address this issue, starting this month, we will provide more detailed information on why a file is labeled, including policy and rule match information on SharePoint. This enhancement will enable SharePoint to send back information on the policy and rule matches that triggered the auto-labeling of files.

This added transparency simplifies the task for administrators, enabling them to review and refine their labeling policies more effectively. As a result, sensitive information will be more consistently and accurately labeled in accordance with organizational standards.

Get started

You can try Microsoft Purview Information Protection and other Microsoft Purview solutions directly within the Microsoft Purview portal with a free trial. *

Interactive guide: aka.ms/InfoProtectionInteractiveGuide

Mechanics video on how to automatically classify and protect documents and data

Mechanics video on AI-powered data classification

And, lastly, join the Microsoft Purview Information Protection Customer Connection Program (CCP) to get information and access to upcoming capabilities in private previews in Microsoft Purview Information Protection. An active NDA is required. Click here to join.

Licensing details

On-demand classification	An E5, E5 Compliance, or E5 Information Protection and Governance license is required. Pricing is based on the number of files, at $20 per 10,000 assets scanned. More pricing information will be available soon.
OCR embedded in EXO	An Azure subscription and M365 E3 or E5 license are required. Pricing is based on the number of images scanned, at $1.00 per 1,000 images scanned. Each scanned image is counted as a single transaction. For more details, see here.
OCR cost estimator for macOS	The cost estimator is available at no cost for 30 days. After this period, generating new estimates will be disabled. However, the insights gained during the 30 days should provide enough data to understand usage patterns and estimate potential monthly costs. Learn more about cost estimator here.
Dynamic watermarking	Included in E5, E5 Compliance, and E5 Information Protection and Governance licenses.
Auto-labeling audit enrichments	Included in E5, E5 Compliance, and E5 Information Protection and Governance licenses.