Track the lineage of your organization’s data with Azure Purview

Published Dec 11 2020 01:38 PM 5,237 Views
Microsoft
Trusted data leads to trusted business insights. Ensuring trust in data goes hand-in-hand with making data easily discoverable. One of the ways to do this is by providing data consumers insight into the data's lineage - where data came from and what transformations it has undergone.

 

Data lineage in Azure Purview helps organizations to understand the data supply chain, from raw data in hybrid data stores, to business insights in Power BI. Azure Purview's turnkey integrations with Azure Data Factory, Power BI, Azure Data Share and other Azure Data Services automatically push lineage to Purview Data Map. 

Azure Purview also supports Apache Atlas Lineage APIs that can be used to access and update custom lineage in Purview Data Map. Hook & Bridge support from Apache Atlas can also be used to easily push lineage from the Hadoop ecosystem.

 

 

ChandruS_3-1607722322601.png

 

Figure 1: Data lineage can be collected from various data systems

 

Azure Purview can stitch lineage across on-prem, multi-cloud and other platforms

Enterprise data estate contains data systems performing extraction, transformation/load, reporting, ML (machine learning) and so on. The goal of lineage feature in Purview is to capture the data linkage at each data transformation to help answer technical and business questions.

For instance, Purview’s lineage functionality will help capture the data movement and transformation stages such as the one described below.

  1. Data Factory would copy data from on-prem/raw zone to a landing zone in the cloud.
  2. Data processing systems like Synapse, Databricks would process and transform data from landing zone to Curated zone(staging) using notebooks or job definition.
  3. Data Warehouse systems then process the data from staging to dimensional models for optimal query performance and aggregation.

Data Analytics and reporting systems will consume the datasets and process through their meta model to create a BI (Business Intelligence) Dashboard, ML experiments etc

 

Root cause analysis scenarios

Azure Purview can help data asset owners troubleshoot a dataset or report containing incorrect data because of upstream issues. Data owners can use Azure Purview lineage as a central tool to understand upstream process failures and be informed about the reasons for discrepancies in their data sources.

 

Rootcause.gif

 

Figure 2: Azure Purview lineage capability showing troubleshooting steps for a possible issue with Power BI report

 

Impact analysis scenarios

Data producers can use Azure Purview lineage to evaluate the downstream impact of changes made to their datasets. Lineage can be used as a central platform to know all the consumers of their datasets and understand the impact of any changes to their dependent datasets and reports. For instance, data engineers can evaluate the downstream impact for a deprecating column in a table or change in data type of a column. The data engineers can use Purview lineage to understand the number data assets potentially impacted by the schema changes of an upstream table. The column level lineage precisely points to the specific data assets that are impacted.

 

Impact.gif

 

Figure 3: Azure Purview lineage capability showing the impact analysis for an upstream change

 

Lineage sources

Azure Purview can connect with Azure Data Factory, Azure Data Share, Power BI to collect lineage currently. In the coming months many more data systems such as Synapse Analytics, Teradata, SQL Server and so on will be able to connect with Azure Purview for lineage collection.

 

Call to Action

We are looking forward to hearing, how Azure Purview helped perform troubleshooting and impact analysis of your data pipelines with the native lineage experiences.

  1. Create an Azure Purview account now and start understanding your data supply chain from raw data to business insights with free scanning for all your SQL Server on-premises and Power BI online
  2. Start by connecting a Data Factory or Data Share account to push lineage.
  3. Scan a Power BI tenant to see lineage in Purview. Use managed identity (MSI) authentication to set up a scan of a Power BI tenant
  4. Learn more on lineage user guide.
9 Comments
Contributor

Nice Blog @ChandruS  . Really Useful insights

Established Member

@ChandruS  - do you have any tentative date of lineage support  to On-Prem SQL server and Azure SQL sources ?

Occasional Visitor

can you manually create lineage? This would be another ruse case where we cannot import lineage, specially when you use metadata driven ADF pipelines.

Microsoft

@RambabuP We plan to support Azure SQL DB as soon as H1 2021. If you are interested to try out early bits lets us know.

Microsoft

@shrivam you can of course report manual lineage. Checkout a utility created by the Microsoft field team. Azure Purview Tips · wjohnson/pyapacheatlas Wiki (github.com) 

Established Member

Hello @ChandruS  - We (help of another MS team) plan to do PoC for our client. Client have on-prem SQL servers apart from may others. Do you have any update on lineage support for On-Prem sql server 

Microsoft

@RambabuP On-Prem SQL is in the roadmap for the second half of 2021.

Occasional Visitor

@ChandruS Thanks. There is no out of box ability to create lineage...like drag and drop (similar to other tools) instead of writing some code.

Senior Member

Hello @ChandruS, we are exploring Purview for data discovery and classification.   One of the roadblock we hit was Purview doesn't provide table-level relationships - e.g. foreign or primary key -- it is quite surprising that Azure Data Catalog used to support this, but Purview doesn't.   Do you have plans to support this in 2021? 

Thank you.

%3CLINGO-SUB%20id%3D%22lingo-sub-1981267%22%20slang%3D%22en-US%22%3ETrack%20the%20lineage%20of%20your%20organization%E2%80%99s%20data%20with%20Azure%20Purview%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1981267%22%20slang%3D%22en-US%22%3E%3CDIV%3ETrusted%20data%20leads%20to%20trusted%20business%20insights.%20Ensuring%20trust%20in%20data%20goes%20hand-in-hand%20with%20making%20data%20easily%20discoverable.%20One%20of%20the%20ways%20to%20do%20this%20is%20by%20providing%20data%20consumers%20insight%20into%20the%20data's%20lineage%20-%20where%20data%20came%20from%20and%20what%20transformations%20it%20has%20undergone.%3C%2FDIV%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EData%20lineage%20in%20Azure%20Purview%20helps%20organizations%20to%20understand%20the%20data%20supply%20chain%2C%20from%20raw%20data%20in%20hybrid%20data%20stores%2C%20to%20business%20insights%20in%20Power%20BI.%20Azure%20Purview's%20turnkey%20integrations%20with%20Azure%20Data%20Factory%2C%20Power%20BI%2C%20Azure%20Data%20Share%20and%20other%20Azure%20Data%20Services%20automatically%20push%20lineage%20to%20Purview%20Data%20Map.%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAzure%20Purview%20also%20supports%20Apache%20Atlas%20Lineage%20APIs%20that%20can%20be%20used%20to%20access%20and%20update%20custom%20lineage%20in%20Purview%20Data%20Map.%20Hook%20%26amp%3B%20Bridge%20support%20from%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fatlas.apache.org%2F2.0.0%2Findex.html%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noreferrer%22%3EApache%20Atlas%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3Ecan%20also%20be%20used%20to%20easily%20push%20lineage%20from%20the%20Hadoop%20ecosystem.%3C%2FP%3E%0A%3CDIV%20id%3D%22tinyMceEditorChandruS_4%22%20class%3D%22mceNonEditable%20lia-copypaste-placeholder%22%3E%26nbsp%3B%3C%2FDIV%3E%0A%3CP%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-center%22%20image-alt%3D%22ChandruS_3-1607722322601.png%22%20style%3D%22width%3A%20672px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F239926i400CF1DC59DD9C2A%2Fimage-dimensions%2F672x225%3Fv%3D1.0%22%20width%3D%22672%22%20height%3D%22225%22%20role%3D%22button%22%20title%3D%22ChandruS_3-1607722322601.png%22%20alt%3D%22ChandruS_3-1607722322601.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CFONT%20size%3D%221%202%203%204%205%206%207%22%3E%3CEM%3EFigure%201%3A%20Data%20lineage%20can%20be%20collected%20from%20various%20data%20systems%3C%2FEM%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EAzure%20Purview%20can%20stitch%20lineage%20across%20on-prem%2C%20multi-cloud%20and%20other%20platforms%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3EEnterprise%20data%20estate%20contains%20data%20systems%20performing%20extraction%2C%20transformation%2Fload%2C%20reporting%2C%20ML%20(machine%20learning)%20and%20so%20on.%20The%20goal%20of%20lineage%20feature%20in%20Purview%20is%20to%20capture%20the%20data%20linkage%20at%20each%20data%20transformation%20to%20help%20answer%20technical%20and%20business%20questions.%3C%2FP%3E%0A%3CP%3EFor%20instance%2C%20Purview%E2%80%99s%20lineage%20functionality%20will%20help%20capture%20the%20data%20movement%20and%20transformation%20stages%20such%20as%20the%20one%20described%20below.%3C%2FP%3E%0A%3COL%3E%0A%3CLI%3EData%20Factory%20would%20copy%20data%20from%20on-prem%2Fraw%20zone%20to%20a%20landing%20zone%20in%20the%20cloud.%3C%2FLI%3E%0A%3CLI%3EData%20processing%20systems%20like%20Synapse%2C%20Databricks%20would%20process%20and%20transform%20data%20from%20landing%20zone%20to%20Curated%20zone(staging)%20using%20notebooks%20or%20job%20definition.%3C%2FLI%3E%0A%3CLI%3EData%20Warehouse%20systems%20then%20process%20the%20data%20from%20staging%20to%20dimensional%20models%20for%20optimal%20query%20performance%20and%20aggregation.%3C%2FLI%3E%0A%3C%2FOL%3E%0A%3CP%3EData%20Analytics%20and%20reporting%20systems%20will%20consume%20the%20datasets%20and%20process%20through%20their%20meta%20model%20to%20create%20a%20BI%20(Business%20Intelligence)%20Dashboard%2C%20ML%20experiments%20etc%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3E%26nbsp%3B%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3ERoot%20cause%20analysis%20scenarios%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3EAzure%20Purview%20can%20help%20data%20asset%20owners%20troubleshoot%20a%20dataset%20or%20report%20containing%20incorrect%20data%20because%20of%20upstream%20issues.%20Data%20owners%20can%20use%20Azure%20Purview%20lineage%20as%20a%20central%20tool%20to%20understand%20upstream%20process%20failures%20and%20be%20informed%20about%20the%20reasons%20for%20discrepancies%20in%20their%20data%20sources.%3C%2FP%3E%0A%3CDIV%20id%3D%22tinyMceEditorChandruS_5%22%20class%3D%22mceNonEditable%20lia-copypaste-placeholder%22%3E%26nbsp%3B%3C%2FDIV%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-center%22%20image-alt%3D%22Rootcause.gif%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F239927i0A535BB3EAE4FBA9%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22Rootcause.gif%22%20alt%3D%22Rootcause.gif%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CFONT%20size%3D%221%202%203%204%205%206%207%22%3E%3CEM%3EFigure%202%3A%20Azure%20Purview%20lineage%20capability%20showing%20troubleshooting%20steps%20for%20a%20possible%20issue%20with%20Power%20BI%20report%3C%2FEM%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3EImpact%20analysis%20scenarios%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3EData%20producers%20can%20use%20Azure%20Purview%20lineage%20to%20evaluate%20the%20downstream%20impact%20of%20changes%20made%20to%20their%20datasets.%20Lineage%20can%20be%20used%20as%20a%20central%20platform%20to%20know%20all%20the%20consumers%20of%20their%20datasets%20and%20understand%20the%20impact%20of%20any%20changes%20to%20their%20dependent%20datasets%20and%20reports.%20For%20instance%2C%20data%20engineers%20can%20evaluate%20the%20downstream%20impact%20for%20a%20deprecating%20column%20in%20a%20table%20or%20change%20in%20data%20type%20of%20a%20column.%20The%20data%20engineers%20can%20use%20Purview%20lineage%20to%20understand%20the%20number%20data%20assets%20potentially%20impacted%20by%20the%20schema%20changes%20of%20an%20upstream%20table.%20The%20column%20level%20lineage%20precisely%20points%20to%20the%20specific%20data%20assets%20that%20are%20impacted.%3C%2FP%3E%0A%3CDIV%20id%3D%22tinyMceEditorChandruS_6%22%20class%3D%22mceNonEditable%20lia-copypaste-placeholder%22%3E%26nbsp%3B%3C%2FDIV%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-center%22%20image-alt%3D%22Impact.gif%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F239928iCF4B23C24909192E%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22Impact.gif%22%20alt%3D%22Impact.gif%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CFONT%20size%3D%221%202%203%204%205%206%207%22%3E%3CEM%3EFigure%203%3A%20Azure%20Purview%20lineage%20capability%20showing%20the%20impact%20analysis%20for%20an%20upstream%20change%3C%2FEM%3E%3C%2FFONT%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3ELineage%20sources%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3EAzure%20Purview%20can%20connect%20with%20Azure%20Data%20Factory%2C%20Azure%20Data%20Share%2C%20Power%20BI%20to%20collect%20lineage%20currently.%20In%20the%20coming%20months%20many%20more%20data%20systems%20such%20as%20Synapse%20Analytics%2C%20Teradata%2C%20SQL%20Server%20and%20so%20on%20will%20be%20able%20to%20connect%20with%20Azure%20Purview%20for%20lineage%20collection.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSTRONG%3ECall%20to%20Action%3C%2FSTRONG%3E%3C%2FP%3E%0A%3CP%3EWe%20are%20looking%20forward%20to%20hearing%2C%20how%20Azure%20Purview%20helped%20perform%20troubleshooting%20and%20impact%20analysis%20of%20your%20data%20pipelines%20with%20the%20native%20lineage%20experiences.%3C%2FP%3E%0A%3COL%3E%0A%3CLI%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fpurview%2Fcreate-catalog-portal%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3ECreate%20an%20Azure%20Purview%20account%20now%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3Eand%20start%20understanding%20your%20data%20supply%20chain%20from%20raw%20data%20to%20business%20insights%20with%20free%20scanning%20for%20all%20your%20SQL%20Server%20on-premises%20and%20Power%20BI%20online%3C%2FLI%3E%0A%3CLI%3EStart%20by%20connecting%20a%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fpurview%2Fhow-to-link-azure-data-factory%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EData%20Factory%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3Eor%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fpurview%2Fhow-to-link-azure-data-share%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EData%20Share%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3Eaccount%20to%20push%20lineage.%3C%2FLI%3E%0A%3CLI%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fpurview%2Fregister-scan-power-bi-tenant%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3EScan%20a%20Power%20BI%20tenant%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3Eto%20see%20lineage%20in%20Purview.%20Use%20managed%20identity%20(MSI)%20authentication%20to%20set%20up%20a%20scan%20of%20a%20Power%20BI%20tenant%3C%2FLI%3E%0A%3CLI%3ELearn%20more%20on%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fpurview%2Fcatalog-lineage-user-guide%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Elineage%20user%20guide%3C%2FA%3E.%3C%2FLI%3E%0A%3C%2FOL%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-1981267%22%20slang%3D%22en-US%22%3E%3CP%3EData%20lineage%20in%20Azure%20Purview%20helps%20organizations%20to%20understand%20the%20data%20supply%20chain%2C%20from%20raw%20data%20in%20hybrid%20data%20stores%2C%20to%20business%20insights%20in%20Power%20BI%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1981267%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAzure%20Purview%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EData%20Catalog%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EData%20Governance%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1982763%22%20slang%3D%22en-US%22%3ERe%3A%20Track%20the%20lineage%20of%20your%20organization%E2%80%99s%20data%20with%20Azure%20Purview%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1982763%22%20slang%3D%22en-US%22%3E%3CP%3ENice%20Blog%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F869147%22%20target%3D%22_blank%22%3E%40ChandruS%3C%2FA%3E%26nbsp%3B%20.%20Really%20Useful%20insights%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2100130%22%20slang%3D%22en-US%22%3ERe%3A%20Track%20the%20lineage%20of%20your%20organization%E2%80%99s%20data%20with%20Azure%20Purview%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2100130%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F869147%22%20target%3D%22_blank%22%3E%40ChandruS%3C%2FA%3E%26nbsp%3B%20-%20do%20you%20have%20any%20tentative%20date%20of%20lineage%20support%26nbsp%3B%20to%20On-Prem%20SQL%20server%20and%20Azure%20SQL%20sources%20%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2117247%22%20slang%3D%22en-US%22%3ERe%3A%20Track%20the%20lineage%20of%20your%20organization%E2%80%99s%20data%20with%20Azure%20Purview%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2117247%22%20slang%3D%22en-US%22%3E%3CP%3Ecan%20you%20manually%20create%20lineage%3F%20This%20would%20be%20another%20ruse%20case%20where%20we%20cannot%20import%20lineage%2C%20specially%20when%20you%20use%20metadata%20driven%20ADF%20pipelines.%3C%2FP%3E%3C%2FLINGO-BODY%3E
Version history
Last update:
‎Dec 11 2020 01:38 PM
Updated by: