Written by Raghu Ramakrishnan, Chief Technology Officer, Azure Data
Data citizens including both technical and business users rely on data lineage for root cause analysis, impact analysis, data quality tracing, and other data governance applications. In the current data landscape, where data is fluidly moving across locations (on-premises to and across clouds) and across data platforms and applications, it is increasingly important to map the lineage of data. That’s why we’re introducing dynamic lineage extraction currently in preview.
Conventional systems map lineage by parsing data transformation scripts, otherwise called static code analysis. This works well in simple scenarios. For example, when a SQL script is used to produce a target table Customer_Sales by joining two tables called Customer and Sales, static code analysis can map data lineage. However, in many real use cases, the data processing workloads are quite complicated. The scripts could be wrapped in a stored procedure that is parametrized and uses dynamic SQL. There could be a decision tree with an if then else statement executing different scripts at runtime. Or simply, data transactions could have failed to commit at runtime.
In all these examples, dynamic analysis is required to track lineage effectively. Even more importantly, static lineage analysis does not associate data and processes with runtime metadata, limiting customer applications significantly. For instance, dynamic lineage encoding by whom and when a stored procedure was run, and from what application and which server, will enable customers to govern privacy, comply with regulations, increase time-to-insight, and better understand their overall data and processes.
Dynamic data lineage—Azure SQL Databases
Today, we are announcing the preview release of dynamic lineage extraction from Azure SQL Databases in Azure Purview. Azure SQL Database is one of the most widely used relational database systems in enterprises. Stored procedures are commonly used to perform data transformations and aggregations on SQL tables for downstream applications. With this release, the Azure Purview Data Map can be further enriched with dynamic lineage metadata such as run status, impacted number of rows, the client from which the stored procedure is run, user info, and other operational details from actual runs of SQL stored procedures in Azure SQL Databases.
Read the full article