Azure Purview enables organizations of all sizes to manage and govern their hybrid data estate. The Azure Purview Data Map enables customers to establish the foundation for effective data governance. Customers create a knowledge graph of data coming in from a range of sources including Azure SQL database, AWS S3 bucket or an on-premises SQL server. Purview makes it easy to register, and automatically scan and classify data at scale.
As you get started with setting up Azure Purview for your organization, here is a ready list of capabilities that you should know:
One of the ways of managing your data sources more efficiently is to be able to group and arrange them by function or region. Azure Purview lets you do that using collections, which are hierarchical in nature. Azure Purview will let you set up scans on individual data sources or on collections of data sources.
The first thing you will need before setting up scans, is setting up your credentials. Azure Purview is designed to be secure by default. All your credentials are created with a Azure Key Vault connection where they are securely stored and managed. These credentials are further used in your scans to connect to your data sources.
Azure Purview scans provide users a fine balance between customizability and re-usability through its scan rule sets.
Complying with industry regulations requires you to know where personally identifiable information (PII) exists. Azure Purview allows you to scan your data sources on a regular cadence using flexible scan schedules that lets you define exactly when and how often scans must be run.
Often times, your data lakes and databases contain certain folders, files or tables that are either considered confidential or contain temporary data that you know is going to be created and deleted frequently. For both these scenarios Purview lets you scope your scans to include only assets that you want scanned and their metadata ingested into the catalog.
Scan your on-premises sources using the self-hosted Microsoft integration runtime. With a few clicks you can set up your own IR infrastructure on-premises and can start scanning your data sources. This same infrastructure will be used to scan your Azure data sources behind virtual networks as well.
Purview provides a degree of scale and performance for scanning which is truly market differentiating. We have benchmarks to the tune of being able to scan a data lake with a million files in less than 30 minutes for rich metadata and classification. (These values may vary, and depend on the data source type, load on the data source and content type within files)
A lot of our customers collect IOT or telemetry data to drive their businesses, which in most cases are files collected on an hourly or daily basis from their source systems. These partitions files share the same schema and classifications. A novel concept that Purview supports is Resource sets, which are used to logically aggregate files that share the same schema and classifications to represent the metadata concisely in your catalog. When a scan runs, it uses folder paths (e.g. year\month\day) and file nomenclature (e.g. foo1.csv, foo2.csv) to determine if a collection of files can be grouped into a resource set. It is only the resource set that is ingested into the catalog and not the hundreds or thousands of partition files. We have had customers, who via Purview scans have been able to compress 150000 partition files into a single resource set! For them, this tremendously aided in enhancing discoverability of assets for their end users in the Purview Data Catalog.
The Purview Data Map is a unified map of your data assets and their relationships that enables more effective governance for your data estate. It is a knowledge graph that is the underpinning for the Purview Data Catalog and all the features that it has to offer. It is scalable and robust to meet your enterprise compliance requirements.
Create an Azure purview account today and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and Power BI online.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.