Blog Post

Security, Compliance, and Identity Blog
6 MIN READ

Announcing new Microsoft Information Protection capabilities to know and protect your data.

Malli1580's avatar
Malli1580
Icon for Microsoft rankMicrosoft
Mar 02, 2021

Announcing new Microsoft Information Protection capabilities to know and protect your data.

Microsoft Information Protection (MIP) is a built-in, intelligent, unified, and extensible solution to protect sensitive data across an organization. MIP provides a unified set of capabilities to know your data, protect your data, and prevent data loss across Microsoft 365 Apps (e.g. Word, PowerPoint, Excel, Outlook), services (e.g., Microsoft Teams, SharePoint, and Exchange), on-premises, devices, and third-party apps and services.

 

Additional sensitive information types (SITs)

Foundational to Microsoft Information Protection are its classification capabilities—from out-of-the-box sensitive information types to exact data match to machine learning trainable classifiers that enable automatic detection and classification of sensitive content at scale. MIP already offers 145+ out-of-the-box sensitive information types. To further enhance coverage and accuracy, we have been rolling out more information types, in a phased manner. Today we are announcing the general availability of 49 new and 12 improved sensitive information types, covering key regulations in Europe and Asia Pacific. We are also announcing new features that improve the accuracy of sensitive information types and enable you to customize them to suit your organization’s unique needs.

 

The additional sensitive information types announced today include:

  • 49 new sensitive information types that cover social security numbers, passport numbers, and driver’s license numbers in European countries that help better compliance with General Data Protection Regulation (GDPR). We expect these new sensitive information types to significantly reduce ‘false positives’ and make it more predictable for customers to refine their policies. For GDPR compliance, we recommend you use these country-specific sensitive information types instead of EU Social Security Number, EU Passport Number, and EU Driver’s License Numbers, which contain the patterns of multiple countries bundled together
  • Improvement in definitions for 12to reduce false positives and help you be compliant with key regulations across the US, Europe, Asia Pacific, and South America.

ABA Routing Number

Argentina DNI

Chile identity card number

Drug Enforcement Agency (DEA) Number

India Permanent Account Number

India Unique Identification (Aadhaar) Number

International classification of diseases (ICD-10-CM)

International classification of diseases (ICD-9-CM)

New Zealand ministry of health number

U.S. Bank Account Number

U.S. Individual Taxpayer Identification Number (ITIN)

UK NINO

 

 

Named Entities support

We are also announcing the public preview of our capability to detect named entities, which span person names, physical addresses, and medical terms & conditions. . In conjunction we are introducing 10 enhanced unified authoring policy templates, which can be used by capabilities like data loss prevention. These policy templates are updated to the existing unified authoring policy templates, such as the U.S. Health Insurance Act (HIPAA) and GDPR. These enhanced policy templates include named entities in their definition and can improve the ability to identify and protect data as required by regulations. Together these updates ensure better detection of personal data and reduction in false positives when sensitive information types such as “US SSN” are found in combination with a named entity, such as a person name.

 

New features to improve accuracy, ease of use, and ability to customize sensitive information types

Five new features are now available:

  1. We are enhancing your ability to set confidence levels by replacing the number-based approach with a more intuitive approach of setting the confidence level at high, medium, or low. This change along with modifications to the underlying algorithms ensures improved useability, increased predictability, and accuracy of your policies to detect and classify sensitive information. For example, while creating a policy for “Japanese My Number Personal”, you can reduce false positives by opting for high confidence so the system looks for both the primary element (e.g. 12 digit numbers) and supporting elements (e.g., related keywords). However, if your risk appetite for this data type is low then you can set the policy at a low confidence level.

Figure: Shows the option to set confidence level as High, Medium, or Low

 

  1. You can copy and edit to customize an out-of-the-box sensitive information type. For example, if an organization uses ‘ss#’ for product IDs or serial numbers, and not for U.S. social security numbers, the out-of-the-box U.S. Social Security Number sensitive information type can be copied and edited to remove SS# from the keyword list to reduce false positives.

Figure: Out-of-the-box sensitive information type can be copied and edited for unique needs

 

detection while ensuring it passes the necessary validations. For example, you can customize sensitive information types available for “India Unique Identification (Aadhaar)” number to detect not only pre-set patterns like 5485-5000-8000 or 5485 5000 8000 as an example but also to detect additional patterns you prescribe like 5485/5000/8000 as a valid “India Unique Identification (Aadhaar)” number by defining the regular expression and then selecting Func_india_aadhaar as a regular expression validator.

Figure: Define your own regular expression and use regular expression validators to pass validation checks

 

  1. You can add checks to further improve the accuracy of your matches. There are six different additional checks supported today: exclude specific matches, start or doesn’t start with characters, ends or doesn’t end with characters, exclude duplicate characters, include or exclude prefixes, and include or exclude suffixes.

Figure: List of six additional checks available to improve the accuracy of matches

 

For example, banks can customize the ‘credit card’ sensitive information type to specifically include credit cards starting with six BIN digits assigned to them by using the ‘starts or doesn’t start with characters’ additional check. This helps banks scope credit card data detection to only cards issued by them.

Figure: Include or exclude specific characters to improve the accuracy of matches

 

  1. You can now specify multiple supporting elements you would like to detect in proximity to the primary element the sensitive information type detects to improve accuracy. For example, the out-of-the-box sensitive information types for credit card look for a ‘keyword_cc_name’ or ‘keyword_cc_verfication’ or ‘expiration date’ in proximity to the credit card number. You can choose to lower the number of potential false positives by modifying the sensitive information type to look for all of them. Another example scenario is if the out-of-the-box credit card sensitive information type is currently detecting other 16-digit IDs like product IDs, you can choose to exclude specific keywords.

Figure: Specific keywords can be included or excluded to improve the accuracy of matches

 

Ability to create larger keyword dictionaries

Detecting sensitive information sometimes requires looking for a large set of keywords, for example when identifying inappropriate language. Keyword dictionaries provide simpler management of keywords at a large scale. We have increased the limit of keyword dictionaries by 10x from the current 100KB (100 thousand characters) to 1MB (over 1 million characters) so you can create larger dictionaries.

 

In addition to the capabilities above, we are adding support for label inheritance to help you secure the full data journey from Azure to Power BI and to Office to ensure your data remains classified and secured across its data journey. Microsoft Azure Synapse Analytics will inherit MIP sensitivity labels applied by Azure Purview.  Data remains classified and secure when brought into Power BI and onwards when exported to Office. The result is secure, end-to-end inheritance and protection of your business data, from source to point of consumption. The preview of this feature will start rolling out over the next few days.

 

Figure: Animation showing Power BI datasets inheriting MIP sensitivity labels

 

Getting started

Here’s information on licensing and on how to get started with the capabilities announced today:

  • To learn more about new sensitive information types announced today, click here. Please note additional SITs announced today will be available for immediate use within Data Loss Prevention for Microsoft 365 services, Microsoft Information Protection for Microsoft 365 services, Communication Compliance, Information Governance, Records Management, and Microsoft Cloud App Security. Additional new sensitive information types will become available on Azure Information Protection unified labeling client and on-premises scanner, Endpoint Data Loss Prevention & Microsoft 365 Apps soon.
  • Improved confidence levels and related accuracy improvements are available for immediate use within Data Loss Prevention for Microsoft 365 services, , Communication Compliance, Information Governance, and Records Management. However, some of the changes related to improved confidence levels that impact the accuracy of sensitive information types will become available on Microsoft Information Protection for Office clients, Azure Information Protection unified labeling client and on-premises scanner, Endpoint Data Loss Prevention & Microsoft 365 Apps, and Microsoft Cloud App Security soon.
  • Sensitive information types are available as part of Microsoft 365 E3 while an E5 license is required to access the enhanced templates and named entity sensitive information types.
  • To learn about additional capabilities we announced for Microsoft 365 apps and MIP, please click here.
  • To learn more about the automatic labeling of your data in Azure Purview, click here

Ayush Rastogi and Prateek Jain, Program Management

Updated May 11, 2021
Version 5.0