Blog Post

Security, Compliance, and Identity Blog
3 MIN READ

Strengthen protection to mitigate data overexposure in GenAI tools with data classification/labeling

Anna_Chiang's avatar
Anna_Chiang
Icon for Microsoft rankMicrosoft
Mar 13, 2024

Organizations are faced with the challenge of discovering and protecting sensitive data across their digital estate in order to prevent unauthorized access and sharing of corporate intellectual property and PII. As organizations adopt generative AI tools to boost worker productivity, which is estimated to be as much as 40% for highly skilled workers and reduce operating costs, these tools also introduce new user activities and produce a lot of additional data – requiring security and compliance management. Proliferation of GenAI applications and content is a top security challenge in AI adoption. According to a recent study, over 80% of leaders cited leakage of sensitive data as their main concern, and 48% of them expect to continue banning all use of GenAI in the workplace1.

 

With Microsoft Purview Information Protection, we provide a solution that is integrated, intelligent, unified, and extensible. It identifies and protects sensitive data across your digital estate, which includes Microsoft clouds such as Microsoft 365 and Azure, as well as on-premises, hybrid and third-party clouds, and SaaS applications such as Webex, Salesforce, Dropbox, and Workspace.

 

In addition, at the Microsoft Ignite event in November, we announced that Microsoft Copilot for Microsoft 365 supports files that have been labeled and/or encrypted by Information Protection, which helps prevent overexposure of sensitive data. To read more, click here.

 

At the last Ignite event, we also announced several new innovations in classification, labeling, and secure collaboration capabilities, which help discover and protect sensitive data. Today, we’re excited to continue the momentum with a new OCR billing cost estimator which will be in public preview shortly.  

 

Optical Character Recognition (OCR) cost estimator provides advance visibility in projected costs

Sensitive data is not only found in various application files and PDFs but also in image files. Optical Character Recognition extracts text from images so that current classifiers/sensitive information types can be used to scan the image content. Existing DLP, autolabeling, Insider Risk Management, and Data Lifecycle Management policies can then be applied if sensitive content is found in these image files. OCR for Exchange, Teams, SharePoint, OneDrive, and endpoints is already generally available. It is an optional feature for E3 and E5 license holders.

 

We are excited to announce the public preview of a free OCR cost estimator, which enables compliance administrators to get advance visibility of how much their expected OCR costs will be. The new OCR billing cost estimator will make it a lot easier for compliance administrators to estimate how much running OCR scans can cost for all their OCR locations without having to sign up for an Azure Subscription or being billed for an OCR trial. This will make it easier for them to obtain budget approval from their management for OCR.

 Admins can run the OCR estimator for only selected locations and scopes (users/sites), as well as edit locations and scopes at any time to find the best configuration setting for their needs, as shown below in Figure 2.

 

Figure 1: OCR cost estimator overview page settings

A detailed estimation dashboard with graphs that show various costs is also provided, as shown in Figure 3 below. Admins are able to download reports and share them internally, to get budget approvals.

 

Estimation dashboard with detailed graphs show:

  • Day-over-day analysis of volume of images
  • Volume and cost split across locations selected.
  • Total estimated cost and estimated cost per user                                                                                   

Figure 2: OCR estimation dashboard

How to Get Started 

Learn more about Information Protection here. Try Microsoft Purview Information Protection and other Microsoft Purview solutions directly in the Microsoft Purview compliance portal with a free trial.

 

Additional resources

 1. First Annual Generative AI study: Business Rewards vs. Security Risks, Q3 2023, ISMG, N=400

Updated Mar 12, 2024
Version 1.0
  • bthomas's avatar
    bthomas
    Iron Contributor

    When will this OCR cost estimator be available, as I don't see this in any of our tenants.