The Azure Purview Data Map is an intelligent graph that describes all the data across your data estate. You can start creating this intelligent graph by extracting metadata from hybrid data stores. But, typically, this metadata as discovered from the individual data stores is defined in isolation and hence, inconsistent and not complete. This is where classifications and classification rules within the Azure Purview Data Map comes in.
Azure Purview enables you to automatically classify data at scale by defining rules. Classifying data in a unified way enables both data discovery and compliance use cases.
Here are a few key concepts to keep in mind as you get started with classifications:
Classifications can be used to describe type of data that exist in data asset or schema. In other words, customers can identify the content of data asset or schema using classifications.
Classifications can be used to describe the phases in their data prep processes (raw zone, landing zone etc.) and assign the classifications to specific assets to mark where they are in the process.
Classifications can also be used to set priorities and develop a plan to allocate budget and resources wisely to achieve security and compliance needs of an organization.
Now that you have a frame for the kinds of classifications that you want to apply to your data, lets take a quick tour of the capabilities themselves:
Azure Purview supports 100+ built-in classifications that range from credit cards, account numbers through a wide range of types such as government IDs, location data and more. Customers can create custom classifications. Then using classification rules and custom classification rules, customers can apply these classifications at scale.
Azure Purview makes use of Regex patterns and bloom filters to classify data. These classifications are then associated with the metadata discovered in the Azure Purview Data Catalog.
You can apply the system classifications on a scan either manually or via the system classification rules. Similarly, you can manually apply custom classifications to a scan or via the custom classification rules. In addition, system classifications can also be applied to a scan using custom classification rules. Note that the manually applied classifications are not overridden by subsequent scans.
Azure Purview provides a set of default classification rules, which are used by the scanning processes to automatically detect certain data types. The default classification rules are non-editable. However, you can define your own custom classification rules. Every classification rule will be tied to a classification.
For every classification rule, a data pattern with a regular expression representing the data stored in the asset field can be specified. We also set thresholds to reduce false positives. Also, a column pattern with a regular expression representing the column names that should be matched can also be specified while creating a classification rule.
Who can view and manage classifications and classification rules?
Purview Data Readers can view all classifiers and classification rules.
Purview Data Curators can create, update, and delete custom classifiers and classification rules.
Create an Azure purview account today and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and Power BI online.
Azure Purview classifications provide users with an excellent mechanism to understand the data estate by tagging assets based on the type of information they represent. We will soon have the capability to automatically deduce the regular expressions for custom classification rules. Stay tuned!