Announcing machine learning features in Microsoft Purview Data Loss Prevention

Shilpa_Bothra · ‎Jul 28 2022

Gaining visibility into the type, volume, and location of sensitive data continues to be a challenge for most organizations, and hybrid work has exaggerated the complexities of protecting sensitive data. Employees are creating, accessing, and sharing data across a myriad of devices and networks, making it imperative for organizations to establish a modern data protection program. The foundational element of such a program is being able to detect and classify data accurately.

At Microsoft, we help customers classify data at scale and with increased accuracy through machine learning and we have been on this journey through Microsoft Purview Information Protection. Information Protection is a built-in, intelligent, unified, and extensible solution to protect sensitive data across your digital estate – in Microsoft 365 cloud services, on-premises, third-party SaaS applications, and more.

With Information Protection, we are building a unified set of capabilities for data classification, labeling, and protection not only in Office Apps, but also in other popular productivity services such as SharePoint Online, Exchange Online, OneDrive for Business, and Microsoft Teams, as well as endpoint devices. Currently there are over 250 pre-built sensitive information types that help identify and classify data. Additionally, organizations can leverage the power of our machine learning-based trainable classifiers, to identify additional categories of data with increased accuracy. These classifiers use business concepts and intelligent algorithms to identify critical information that lives in your data landscape. We currently provide over 70 ready-to-use classifiers that can help detect legal agreements, financial documents, intellectual property, and more. You can also create custom classifiers to meet your unique needs.

Today we are excited to announce additional value to the Microsoft Purview Data Loss Prevention in public preview, including support for trainable classifiers, credential-sensitive information types (SITs), and sensitive service domains. Microsoft Purview DLP is an integral part of Information Protection, and it leverages built-in sensitive data detection and native integration with sensitivity labels from Information Protection to create and enforce policies that help prevent sensitive data exfiltration through common egress points. Microsoft Purview DLP is easy to turn on with protection built-in to Microsoft 365 cloud services, Office apps, and on endpoint devices with Windows and Microsoft Edge. It is offered and managed as a single, integrated, and extensible offering that allows organizations to manage their DLP policies from a single location. DLP controls can also be extended to macOS endpoints and Chrome browser, and various cloud apps such as Dropbox, Box, Google Drive, and others through the integration with Microsoft Defender for Cloud Apps.

Support for trainable classifiers as a condition in your DLP policy

We are excited to share that you can now use trainable classifiers (out-of-the-box as well as custom) as conditions in your DLP policies to detect and prevent sensitive business contexts (e.g., financial statements, contracts, legal and HR documents, etc.) as well as behavioral context data (e.g., discrimination, profanity, and more) from unauthorized use, sharing, or transfer.

We have also updated three enhanced policy templates to use trainable classifiers in addition to SITs to help you with comprehensive data protection. The three templates are

US GLBA Enhanced: for detection and protecting financial data and meeting compliance with the GLBA
US Healthcare Act Enhanced: for detection and protecting healthcare and medical data and meeting compliance with HIPAA and other related regulations
US PII Enhanced: for detection and protecting privacy content commonly found in personnel and human resource documents for meeting compliance with PII and other state privacy regulations

Figure1: Configuring a DLP policy to use trainable classifiers as condition

See below for a chart of business content and behavior classifiers supported in DLP policies today. We will be adding additional classifiers to this list in the coming months.

Business content classifiers	Sample content detected by classifier
Business – Finance	budget proposal, business analysis, financial statements, proposals and sales reports.
Business – IT	cybersecurity assessments, incident reports, IT admin documents and software specifications.
Business – Tax	tax planning documents, tax forms, tax filing related documents and tax regulation documents.
Business – Contract	non-disclosure agreement, statement of work, loan agreement, lease agreement, employment contract, non-compete agreement.
Business – Healthcare	medical records, health benefits documents, insurance forms, prior authorizations, and referral forms.
Business – Legal	court cases, corporate bylaws, legal advice and documents with terms and conditions.
Business - HR	job posts, hiring documents, onboarding and training documents, payroll documents and employee disciplines.
Business - Procurement	quotation, purchase order, sales order, delivery order and invoice.
Business - IP	patent applications, documents with non-disclosure content.
Source Code	detects items that contain a set of instructions and statements written in the top 25 used computer programming languages on GitHub
Resume	detects items that are textual accounts of an applicant's personal, educational, professional qualifications, work experience, and other personally identifying information
Behavior classifiers	Sample Content detected by the classifier
Harassment	detects a specific category of offensive language text items related to offensive conduct targeting one or multiple individuals based on the following traits: race, ethnicity, religion, national origin, gender, sexual orientation, age, disability
Profanity	detects a specific category of offensive language text items that contain expressions that embarrass most people
Threat	detects a specific category of offensive language text items related to threats to commit violence or do physical harm or damage to a person or property
Discrimination	detects explicit discriminatory language and is particularly sensitive to discriminatory language against the African American/Black communities when compared to other communities

Support for credential SITs in your DLP policies

We recently announced public preview of 42 new SITs, enabling organizations to identify, classify, and protect credentials found in documents across OneDrive, SharePoint, Teams, Office Web Apps, Outlook, Exchange Online, Defender for Cloud Apps, and Windows devices. Organizations can leverage these SITs in the Information Protection auto-labeling and DLP policies to quickly discover and classify a wide range of complex digital authentication credentials such as user credentials (username and passwords), default passwords, Azure cloud resources, and more.

Support for sensitive service domains in your endpoint DLP policies

We are also excited to share the public preview of sensitive service domains in endpoint DLP that allows you to list a website as a sensitive domain and subsequently configure DLP policies that can prevent users from printing, copying data, or saving website contents as local files on Microsoft Edge. Learn more here. Sensitive service domain controls can be extended to Google Chrome through the Microsoft Purview extension.

Figure2: Adding a sensitive service domain in endpoint DLP

Ability to clone a DLP policy

We recently released in general availability, the ability to clone a DLP policy. Follow the steps below to clone an existing DLP policy to edit it easily.

Navigate to the DLP policies page in the Microsoft Purview compliance portal
Select the policy you want to clone
Select 'Copy policy'
Edit the policy configurations as needed using the policy creation wizard
Save and use the policy

Figure 3: Cloning a DLP policy

Get started with a free trial

You can try Microsoft Purview Information Protection and DLP by enabling the free trial of Microsoft Purview from Microsoft Purview compliance portal. All you need is a Microsoft 365 E3 subscription!

Additional resources:

Watch these videos to learn more about Microsoft’s approach to cloud DLP, endpoint DLP, and maximizing the value of DLP
Listen to this podcast on Microsoft Purview DLP.
Learn more about configuring DLP policies for Microsoft 365 services and endpoints
Learn more about using sensitivity labels as a condition for DLP policies here
Learn more about sensitivity labels here
Learn more about Predicates for unified DLP here
Read these blogs for the latest on Microsoft Purview Information Protection

We look forward to hearing your feedback!

Thank you,

Microsoft Information Protection team

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Announcing machine learning features in Microsoft Purview Data Loss Prevention