Prepare your Azure data sources at scale to onboard into Azure Purview for registration and scanning
Published Jun 02 2021 03:16 PM 2,447 Views

If you are planning to register and scan Azure data sources into Azure Purview, there are a series of prerequisites you may need to look into first. Azure Purview account requires access both in terms of network and identity so registration and scanning of your Azure data sources can happen.

 

Running readiness check script on data sources Azure subscriptions 

Azure Purview account needs a credential to connect to data sources to scan them. This can be done using different methods such as Azure Purview Managed Identity (MSI), Service Principal, SQL Credential, etc. For example, if your data source is an Azure SQL Database, you could save SQL admin credentials inside an Azure Key Vault as a Secret and create a new credential inside Azure Purview to use that secret when scanning. When setting up a new scan, you can use this credential to connect to the data source and bring information about metadata. For setting up a credential, we recommend using Azure Purview managed identity whenever is possible, so this way, you can reduce complexity of setting up additional resources and credentials. In this case, the managed identity needs to have access to each data source through Azure RBAC control plane and data plane.

 

Use the following decision tree if you are unsure what credential type is the most suitable for your data sources:  

Create and manage credentials for scans - Azure Purview | Microsoft Docs

 

If you are protecting your data sources using Azure Service Endpoint, you would need to allow AzureServices to connect to your data sources. 

 

Validating if these requirements are in place and setting them up, may be time consuming if you have hundred of Azure resources and subscriptions, therefore, we have recently included a series of tools inside Azure Purview documentation, so you can validate readiness of your data sources in Azure and configure required RBAC, SQL authentication and network access.

 

Currently, the following data sources are supported in the scripts:

  • Azure Blob Storage (BlobStorage)
  • Azure Data Lake Storage Gen 2 (ADLSGen2)
  • Azure Data Lake Storage Gen 1 (ADLSGen1)
  • Azure SQL Database (AzureSQLDB)
  • Azure SQL Managed Instance (AzureSQLMI)
  • Azure Synapse (Synapse)

 

Use the following guide, if you are interested in first validating the readiness of your Azure resources such as Azure SQL Database, Synapse, Azure Blob Storage, or Azure Data Lake:

Tutorial: Check data sources readiness at scale (preview) - Azure Purview | Microsoft Docs

 

The guide walks you through steps and provide you access to a PowerShell script to automate the readiness check. Once you run the tool, the output report helps you to discover current state of your Azure data sources and highlights the missing configurations that is needed for registering and scanning them inside Azure Purview.

Whether you are the Azure subscription or data services resource owner, or you need to reach out those who have access to these Azure resources, using this output report you will have a clear list of required settings to apply.

 

Configure Azure Purview MSI settings at scale 

We have also developed a tool that can help you to automate configuring the required network, SQL authentication and Azure RBAC control and data plane assignments at data sources at scale.

 

Follow the guide provided in the following link for more information about the tool and how to use it: 

Tutorial: Configure access to data sources for Azure Purview MSI at scale (preview) - Azure Purview ...

 

 

Version history
Last update:
‎Sep 21 2022 03:23 PM
Updated by: