Announcing new Microsoft Information Protection capabilities to know and protect your data.
Microsoft Information Protection (MIP) is a built-in, intelligent, unified, and extensible solution to protect sensitive data across an organization. MIP provides a unified set of capabilities to know your data, protect your data, and prevent data loss across Microsoft 365 Apps (e.g. Word, PowerPoint, Excel, Outlook), services (e.g., Microsoft Teams, SharePoint, and Exchange), on-premises, devices, and third-party apps and services.
Additional sensitive information types (SITs)
Foundational to Microsoft Information Protection are its classification capabilities—from out-of-the-box sensitive information types to exact data match to machine learning trainable classifiers that enable automatic detection and classification of sensitive content at scale. MIP already offers 145+ out-of-the-box sensitive information types. To further enhance coverage and accuracy, we have been rolling out more information types, in a phased manner. Today we are announcing the general availability of 49 new and 12 improved sensitive information types, covering key regulations in Europe and Asia Pacific. We are also announcing new features that improve the accuracy of sensitive information types and enable you to customize them to suit your organization’s unique needs.
The additional sensitive information types announced today include:
ABA Routing Number |
Argentina DNI |
Chile identity card number |
Drug Enforcement Agency (DEA) Number |
India Permanent Account Number |
India Unique Identification (Aadhaar) Number |
International classification of diseases (ICD-10-CM) |
International classification of diseases (ICD-9-CM) |
New Zealand ministry of health number |
U.S. Bank Account Number |
U.S. Individual Taxpayer Identification Number (ITIN) |
UK NINO |
Named Entities support
We are also announcing the public preview of our capability to detect named entities, which span person names, physical addresses, and medical terms & conditions. . In conjunction we are introducing 10 enhanced unified authoring policy templates, which can be used by capabilities like data loss prevention. These policy templates are updated to the existing unified authoring policy templates, such as the U.S. Health Insurance Act (HIPAA) and GDPR. These enhanced policy templates include named entities in their definition and can improve the ability to identify and protect data as required by regulations. Together these updates ensure better detection of personal data and reduction in false positives when sensitive information types such as “US SSN” are found in combination with a named entity, such as a person name.
New features to improve accuracy, ease of use, and ability to customize sensitive information types
Five new features are now available:
Figure: Shows the option to set confidence level as High, Medium, or Low
Figure: Out-of-the-box sensitive information type can be copied and edited for unique needs
detection while ensuring it passes the necessary validations. For example, you can customize sensitive information types available for “India Unique Identification (Aadhaar)” number to detect not only pre-set patterns like 5485-5000-8000 or 5485 5000 8000 as an example but also to detect additional patterns you prescribe like 5485/5000/8000 as a valid “India Unique Identification (Aadhaar)” number by defining the regular expression and then selecting Func_india_aadhaar as a regular expression validator.
Figure: Define your own regular expression and use regular expression validators to pass validation checks
Figure: List of six additional checks available to improve the accuracy of matches
For example, banks can customize the ‘credit card’ sensitive information type to specifically include credit cards starting with six BIN digits assigned to them by using the ‘starts or doesn’t start with characters’ additional check. This helps banks scope credit card data detection to only cards issued by them.
Figure: Include or exclude specific characters to improve the accuracy of matches
Figure: Specific keywords can be included or excluded to improve the accuracy of matches
Ability to create larger keyword dictionaries
Detecting sensitive information sometimes requires looking for a large set of keywords, for example when identifying inappropriate language. Keyword dictionaries provide simpler management of keywords at a large scale. We have increased the limit of keyword dictionaries by 10x from the current 100KB (100 thousand characters) to 1MB (over 1 million characters) so you can create larger dictionaries.
In addition to the capabilities above, we are adding support for label inheritance to help you secure the full data journey from Azure to Power BI and to Office to ensure your data remains classified and secured across its data journey. Microsoft Azure Synapse Analytics will inherit MIP sensitivity labels applied by Azure Purview. Data remains classified and secure when brought into Power BI and onwards when exported to Office. The result is secure, end-to-end inheritance and protection of your business data, from source to point of consumption. The preview of this feature will start rolling out over the next few days.
Figure: Animation showing Power BI datasets inheriting MIP sensitivity labels
Getting started
Here’s information on licensing and on how to get started with the capabilities announced today:
Ayush Rastogi and Prateek Jain, Program Management
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.