Map your data estate with Azure Purview
Published Dec 09 2020 02:01 PM 18.1K Views
Microsoft

Azure Purview enables organizations of all sizes to manage and govern their hybrid data estate. The Azure Purview Data Map enables customers to establish the foundation for effective data governance. Customers create a knowledge graph of data coming in from a range of sources including Azure SQL database, AWS S3 bucket or an on-premises SQL server. Purview makes it easy to register, and automatically scan and classify data at scale.

 

As you get started with setting up Azure Purview for your organization, here is a ready list of capabilities that you should know:

1. Collections

One of the ways of managing your data sources more efficiently is to be able to group and arrange them by function or region. Azure Purview lets you do that using collections, which are hierarchical in nature. Azure Purview will let you set up scans on individual data sources or on collections of data sources.

 

viseshag_0-1607130416417.png

 

2. Security and Credentials

The first thing you will need before setting up scans, is setting up your credentials. Azure Purview is designed to be secure by default. All your credentials are created with a Azure Key Vault connection where they are securely stored and managed. These credentials are further used in your scans to connect to your data sources.

 

viseshag_6-1607207310864.png

3. Scan-rule sets

Azure Purview scans provide users a fine balance between customizability and re-usability through its scan rule sets.

  • A scan rule set lets you select file types for schema extraction and classification, and it also let’s you define new custom file types.

viseshag_2-1607207042282.png

  • You can select which system and custom classification rules you want to run. The system classification rules are the same as the sensitive information types in Microsoft 365, which will let you extend your sensitivity labeling policies in the Microsoft 365 Compliance Center to Azure Purview supported stores. Learn how here.

viseshag_5-1607207213419.png

  • These scan rule sets can be used across multiple data sources and scans.
  • Azure Purview also supports system default scan rule sets that further simplify scan set up.

4. Scan schedules

Complying with industry regulations requires you to know where personally identifiable information (PII) exists. Azure Purview allows you to scan your data sources on a regular cadence using flexible scan schedules that lets you define exactly when and how often scans must be run.

 

viseshag_1-1607206960217.png

5. Scoped scans

Often times, your data lakes and databases contain certain folders, files or tables that are either considered confidential or contain temporary data that you know is going to be created and deleted frequently. For both these scenarios Purview lets you scope your scans to include only assets that you want scanned and their metadata ingested into the catalog.

 

viseshag_3-1607129486220.png

 

6. Scanning on-premises sources

Scan your on-premises sources using the self-hosted Microsoft integration runtime. With a few clicks you can set up your own IR infrastructure on-premises and can start scanning your data sources. This same infrastructure will be used to scan your Azure data sources behind virtual networks as well.

 

viseshag_0-1607206902035.png

7. Scale and performance

Purview provides a degree of scale and performance for scanning which is truly market differentiating. We have benchmarks to the tune of being able to scan a data lake with a million files in less than 30 minutes for rich metadata and classification. (These values may vary, and depend on the data source type, load on the data source and content type within files)

8. Resource sets

A lot of our customers collect IOT or telemetry data to drive their businesses, which in most cases are files collected on an hourly or daily basis from their source systems. These partitions files share the same schema and classifications. A novel concept that Purview supports is Resource sets, which are used to logically aggregate files that share the same schema and classifications to represent the metadata concisely in your catalog. When a scan runs, it uses folder paths (e.g. year\month\day) and file nomenclature (e.g. foo1.csv, foo2.csv) to determine if a collection of files can be grouped into a resource set. It is only the resource set that is ingested into the catalog and not the hundreds or thousands of partition files. We have had customers, who via Purview scans have been able to compress 150000 partition files into a single resource set! For them, this tremendously aided in enhancing discoverability of assets for their end users in the Purview Data Catalog.

9. Azure Purview Data Map

The Purview Data Map is a unified map of your data assets and their relationships that enables more effective governance for your data estate. It is a knowledge graph that is the underpinning for the Purview Data Catalog and all the features that it has to offer. It is scalable and robust to meet your enterprise compliance requirements.

 

Get started with Azure Purview today 

 Create an Azure purview account today and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and Power BI online.

For more information, check out a demo of Azure Purview or start a conversation within the Azure Purview tech community

3 Comments
Version history
Last update:
‎Sep 21 2022 03:21 PM
Updated by: