Blog Post

Security, Compliance, and Identity Blog
5 MIN READ

Announcing machine learning features in Microsoft Purview Data Loss Prevention

Shilpa_Bothra's avatar
Shilpa_Bothra
Icon for Microsoft rankMicrosoft
Jul 28, 2022

Gaining visibility into the type, volume, and location of sensitive data continues to be a challenge for most organizations, and hybrid work has exaggerated the complexities of protecting sensitive data. Employees are creating, accessing, and sharing data across a myriad of devices and networks, making it imperative for organizations to establish a modern data protection program. The foundational element of such a program is being able to detect and classify data accurately.

 

At Microsoft, we help customers classify data at scale and with increased accuracy through machine learning and we have been on this journey through Microsoft Purview Information Protection. Information Protection is a built-in, intelligent, unified, and extensible solution to protect sensitive data across your digital estate – in Microsoft 365 cloud services, on-premises, third-party SaaS applications, and more.

With Information Protection, we are building a unified set of capabilities for data classification, labeling, and protection not only in Office Apps, but also in other popular productivity services such as SharePoint Online, Exchange Online, OneDrive for Business, and Microsoft Teams, as well as endpoint devices. Currently there are over 250 pre-built sensitive information types that help identify and classify data. Additionally, organizations can leverage the power of our machine learning-based trainable classifiers, to identify additional categories of data with increased accuracy. These classifiers use business concepts and intelligent algorithms to identify critical information that lives in your data landscape. We currently provide over 70 ready-to-use classifiers that can help detect legal agreements, financial documents, intellectual property, and more. You can also create custom classifiers to meet your unique needs.  

 

Today we are excited to announce additional value to the Microsoft Purview Data Loss Prevention in public preview, including support for trainable classifiers, credential-sensitive information types (SITs), and sensitive service domains. Microsoft Purview DLP is an integral part of Information Protection, and it leverages built-in sensitive data detection and native integration with sensitivity labels from Information Protection to create and enforce policies that help prevent sensitive data exfiltration through common egress points. Microsoft Purview DLP is easy to turn on with protection built-in to Microsoft 365 cloud services, Office apps, and on endpoint devices with Windows and Microsoft Edge. It is offered and managed as a single, integrated, and extensible offering that allows organizations to manage their DLP policies from a single location. DLP controls can also be extended to macOS endpoints and Chrome browser, and various cloud apps such as Dropbox, Box, Google Drive, and others through the integration with Microsoft Defender for Cloud Apps.

 

Support for trainable classifiers as a condition in your DLP policy

We are excited to share that you can now use trainable classifiers (out-of-the-box as well as custom) as conditions in your DLP policies to detect and prevent sensitive business contexts (e.g., financial statements, contracts, legal and HR documents, etc.) as well as behavioral context data (e.g., discrimination, profanity, and more) from unauthorized use, sharing, or transfer.

 

We have also updated three enhanced policy templates to use trainable classifiers in addition to SITs to help you with comprehensive data protection. The three templates are

  1. US GLBA Enhanced: for detection and protecting financial data and meeting compliance with the GLBA
  2. US Healthcare Act Enhanced: for detection and protecting healthcare and medical data and meeting compliance with HIPAA and other related regulations
  3. US PII Enhanced: for detection and protecting privacy content commonly found in personnel and human resource documents for meeting compliance with PII and other state privacy regulations

Figure1: Configuring a DLP policy to use trainable classifiers as condition

See below for a chart of business content and behavior classifiers supported in DLP policies today. We will be adding additional classifiers to this list in the coming months.

 

Business content classifiers 

Sample content detected by classifier 

Business – Finance 

budget proposal, business analysis, financial statements, proposals and sales reports. 

Business – IT 

cybersecurity assessments, incident reports, IT admin documents and software specifications. 

Business – Tax 

tax planning documents, tax forms, tax filing related documents and tax regulation documents. 

Business – Contract 

non-disclosure agreement, statement of work, loan agreement, lease agreement, employment contract, non-compete agreement. 

Business – Healthcare 

medical records, health benefits documents, insurance forms, prior authorizations, and referral forms. 

Business – Legal 

court cases, corporate bylaws, legal advice and documents with terms and conditions. 

Business - HR 

job posts, hiring documents, onboarding and training documents, payroll documents and employee disciplines. 

Business - Procurement 

quotation, purchase order, sales order, delivery order and invoice. 

Business - IP 

patent applications, documents with non-disclosure content. 

Source Code 

detects items that contain a set of instructions and statements written in the top 25 used computer programming languages on GitHub 

Resume 

detects items that are textual accounts of an applicant's personal, educational, professional qualifications, work experience, and other personally identifying information 

Behavior classifiers 

Sample Content detected by the classifier

Harassment 

detects a specific category of offensive language text items related to offensive conduct targeting one or multiple individuals based on the following traits: race, ethnicity, religion, national origin, gender, sexual orientation, age, disability 

Profanity 

detects a specific category of offensive language text items that contain expressions that embarrass most people 

Threat 

detects a specific category of offensive language text items related to threats to commit violence or do physical harm or damage to a person or property 

Discrimination 

detects explicit discriminatory language and is particularly sensitive to discriminatory language against the African American/Black communities when compared to other communities 

 

Support for credential SITs in your DLP policies

We recently announced public preview of 42 new SITs, enabling organizations to identify, classify, and protect credentials found in documents across OneDrive, SharePoint, Teams, Office Web Apps, Outlook, Exchange Online, Defender for Cloud Apps, and Windows devices. Organizations can leverage these SITs in the Information Protection auto-labeling and DLP policies to quickly discover and classify a wide range of complex digital authentication credentials such as user credentials (username and passwords), default passwords, Azure cloud resources, and more.

 

Support for sensitive service domains in your endpoint DLP policies  

We are also excited to share the public preview of sensitive service domains in endpoint DLP that allows you to list a website as a sensitive domain and subsequently configure DLP policies that can prevent users from printing, copying data, or saving website contents as local files on Microsoft Edge. Learn more here. Sensitive service domain controls can be extended to Google Chrome through the Microsoft Purview extension.

Figure2: Adding a sensitive service domain in endpoint DLP

Ability to clone a DLP policy 

We recently released in general availability, the ability to clone a DLP policy. Follow the steps below to clone an existing DLP policy to edit it easily. 

  1. Navigate to the DLP policies page in the Microsoft Purview compliance portal 
  2. Select the policy you want to clone
  3. Select 'Copy policy' 
  4. Edit the policy configurations as needed using the policy creation wizard
  5. Save and use the policy 

Figure 3: Cloning a DLP policy

Get started with a free trial

You can try Microsoft Purview Information Protection and DLP by enabling the free trial of Microsoft Purview from Microsoft Purview compliance portal. All you need is a Microsoft 365 E3 subscription!

 

Additional resources:

  • Watch these videos to learn more about Microsoft’s approach to cloud DLPendpoint DLP, and maximizing the value of DLP
  • Listen to this podcast on Microsoft Purview DLP.
  • Learn more about configuring DLP policies for Microsoft 365 services and endpoints
  • Learn more about using sensitivity labels as a condition for DLP policies here
  • Learn more about sensitivity labels here
  • Learn more about Predicates for unified DLP here
  • Read these blogs for the latest on Microsoft Purview Information Protection

 We look forward to hearing your feedback!

 

Thank you,

Microsoft Information Protection team

Updated Jul 28, 2022
Version 1.0
  • Dillonwhite's avatar
    Dillonwhite
    Copper Contributor

    I will definitely be checking this out in my demo's and possibly adding it to my Purview Workshops! 

  • GlenBee's avatar
    GlenBee
    Copper Contributor

    I can not see the trainable classifiers from the drop down when adding rule.  HAve only recently upgraded to full E5, is there anything I shoud check to get this working?

  • Hi GlenBee, the roll-out process takes some time to complete. Please check back in a weeks' time and if you still do not see it, let us know and we can help you figure out what might be the issue.