Track the lineage of your organization’s data with Azure Purview
Published Dec 11 2020 01:38 PM 10.2K Views
Microsoft
Trusted data leads to trusted business insights. Ensuring trust in data goes hand-in-hand with making data easily discoverable. One of the ways to do this is by providing data consumers insight into the data's lineage - where data came from and what transformations it has undergone.

 

Data lineage in Azure Purview helps organizations to understand the data supply chain, from raw data in hybrid data stores, to business insights in Power BI. Azure Purview's turnkey integrations with Azure Data Factory, Power BI, Azure Data Share and other Azure Data Services automatically push lineage to Purview Data Map. 

Azure Purview also supports Apache Atlas Lineage APIs that can be used to access and update custom lineage in Purview Data Map. Hook & Bridge support from Apache Atlas can also be used to easily push lineage from the Hadoop ecosystem.

 

 

ChandruS_3-1607722322601.png

 

Figure 1: Data lineage can be collected from various data systems

 

Azure Purview can stitch lineage across on-prem, multi-cloud and other platforms

Enterprise data estate contains data systems performing extraction, transformation/load, reporting, ML (machine learning) and so on. The goal of lineage feature in Purview is to capture the data linkage at each data transformation to help answer technical and business questions.

For instance, Purview’s lineage functionality will help capture the data movement and transformation stages such as the one described below.

  1. Data Factory would copy data from on-prem/raw zone to a landing zone in the cloud.
  2. Data processing systems like Synapse, Databricks would process and transform data from landing zone to Curated zone(staging) using notebooks or job definition.
  3. Data Warehouse systems then process the data from staging to dimensional models for optimal query performance and aggregation.

Data Analytics and reporting systems will consume the datasets and process through their meta model to create a BI (Business Intelligence) Dashboard, ML experiments etc

 

Root cause analysis scenarios

Azure Purview can help data asset owners troubleshoot a dataset or report containing incorrect data because of upstream issues. Data owners can use Azure Purview lineage as a central tool to understand upstream process failures and be informed about the reasons for discrepancies in their data sources.

 

Rootcause.gif

 

Figure 2: Azure Purview lineage capability showing troubleshooting steps for a possible issue with Power BI report

 

Impact analysis scenarios

Data producers can use Azure Purview lineage to evaluate the downstream impact of changes made to their datasets. Lineage can be used as a central platform to know all the consumers of their datasets and understand the impact of any changes to their dependent datasets and reports. For instance, data engineers can evaluate the downstream impact for a deprecating column in a table or change in data type of a column. The data engineers can use Purview lineage to understand the number data assets potentially impacted by the schema changes of an upstream table. The column level lineage precisely points to the specific data assets that are impacted.

 

Impact.gif

 

Figure 3: Azure Purview lineage capability showing the impact analysis for an upstream change

 

Lineage sources

Azure Purview can connect with Azure Data Factory, Azure Data Share, Power BI to collect lineage currently. In the coming months many more data systems such as Synapse Analytics, Teradata, SQL Server and so on will be able to connect with Azure Purview for lineage collection.

 

Call to Action

We are looking forward to hearing, how Azure Purview helped perform troubleshooting and impact analysis of your data pipelines with the native lineage experiences.

  1. Create an Azure Purview account now and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and Power BI online
  2. Start by connecting a Data Factory or Data Share account to push lineage.
  3. Scan a Power BI tenant to see lineage in Purview. Use managed identity (MSI) authentication to set up a scan of a Power BI tenant
  4. Learn more on lineage user guide.
10 Comments
Version history
Last update:
‎Sep 21 2022 03:21 PM
Updated by: