Data lineage in Azure Purview helps organizations to understand the data supply chain, from raw data in hybrid data stores, to business insights in Power BI. Azure Purview's turnkey integrations with Azure Data Factory, Power BI, Azure Data Share and other Azure Data Services automatically push lineage to Purview Data Map.
Azure Purview also supports Apache Atlas Lineage APIs that can be used to access and update custom lineage in Purview Data Map. Hook & Bridge support from Apache Atlas can also be used to easily push lineage from the Hadoop ecosystem.
Figure 1: Data lineage can be collected from various data systems
Azure Purview can stitch lineage across on-prem, multi-cloud and other platforms
Enterprise data estate contains data systems performing extraction, transformation/load, reporting, ML (machine learning) and so on. The goal of lineage feature in Purview is to capture the data linkage at each data transformation to help answer technical and business questions.
For instance, Purview’s lineage functionality will help capture the data movement and transformation stages such as the one described below.
Data Analytics and reporting systems will consume the datasets and process through their meta model to create a BI (Business Intelligence) Dashboard, ML experiments etc
Root cause analysis scenarios
Azure Purview can help data asset owners troubleshoot a dataset or report containing incorrect data because of upstream issues. Data owners can use Azure Purview lineage as a central tool to understand upstream process failures and be informed about the reasons for discrepancies in their data sources.
Figure 2: Azure Purview lineage capability showing troubleshooting steps for a possible issue with Power BI report
Impact analysis scenarios
Data producers can use Azure Purview lineage to evaluate the downstream impact of changes made to their datasets. Lineage can be used as a central platform to know all the consumers of their datasets and understand the impact of any changes to their dependent datasets and reports. For instance, data engineers can evaluate the downstream impact for a deprecating column in a table or change in data type of a column. The data engineers can use Purview lineage to understand the number data assets potentially impacted by the schema changes of an upstream table. The column level lineage precisely points to the specific data assets that are impacted.
Figure 3: Azure Purview lineage capability showing the impact analysis for an upstream change
Azure Purview can connect with Azure Data Factory, Azure Data Share, Power BI to collect lineage currently. In the coming months many more data systems such as Synapse Analytics, Teradata, SQL Server and so on will be able to connect with Azure Purview for lineage collection.
Call to Action
We are looking forward to hearing, how Azure Purview helped perform troubleshooting and impact analysis of your data pipelines with the native lineage experiences.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.