Blog Post

Security, Compliance, and Identity Blog
5 MIN READ

TomPo - Lineage of Data Model and Reports

InnovatorsClub's avatar
Jan 24, 2023

Tushar_Pardeshi , Rajeshdadi , Yogesh_Jain , InnovatorsClub 

 

TOMPo is an internal metadata driven solution and is short for “Tabular Object Model Power BI”, as it mainly revolves around the same tech stack. Data models (AAS or Power BI datasets) and Power BI reports are widely used and there is always a need to understand the design of the model, definition of the measures, and most importantly, the usage of the measures and columns in different Power BI reports and pages.

With the ocean of attributes and entities that we deal with, it is impossible to remember them all or always have up-to-date documentation. Hence, we often find ourselves running from pillar to post to discover and locate this information within data models. Sometimes we may not have access to PROD models which results in wastage of time and effort, and it also creates a dependency on the dev team as well to recircle back the required information to the business. Also on the security front, one would like to understand the roles and memberships in the models. TOMPo acts as a one stop solution to address these needs.

 

Features of TomPo:

TOMPo currently provides the below offerings for lineage:

  1. Shows the underlying data model design without the need to open the Power BI dataset\Visual Studio solution.
  2. Shows Entity names within the data model along with the actual source details in Datalake\Dedicated SQL server.
  3. Gives us the details about columns, measures, and their definitions.
  4. Captures details about relationships and cardinality.
  5. Shows roles and memberships.
  6. Provides an Impact analysis – see the usage of entities, columns, and measures in different Power BI reports and pages.

 

When Do I Use This Solution?

The solution presented here is generic in nature and it can show the data transformations\journey from Datalake\SQL server\Dedicated SQL (as a Source) > Power BI datasets\AAS Models > Power BI reports.

This solution is for anyone who wants to understand their reports better or has any of the following scenarios:

  1. You want to maintain an automatic, systematic and seamless lineage documentation about your data models and Power BI reports.
  2. You are someone who wants to understand the design of the data model, definition of the measures, details about the reports and impact analysis.
  3. You are a developer or a support person who wants to accelerate the process of discoverability, lead data driven conversations with business partners or your colleagues.
  4. You are a Business Analyst or Project Manager who wants to avoid dependencies on the development team to get your questions answered.
  5. You are a Data Scientist who wants to get a quick glimpse of the metrics.
  6. You are a person who doesn’t have access to an actual PROD code base, models or the reports, and wants to get a quick overview of the business and metrics that are being delivered.
  7. You are a curious person and just want to adapt this lineage solution and see how it helps your leadership team, end users and organization at large.

 

How the Results Should Look:

Sample Snip from TOMPo:

 

<some details have been hidden intentionally in the above snip>

 

Consumptions of Tables\Attributes & Measures in Power BI Reports and Pages: 

<some details have been hidden intentionally in the above snip>

Architecture Behind TomPo:

 

Scenario 1: Both Power BI Dataset & Reports are residing in Premium workspace.

 

Scenario 2: Support for Power BI Pro workspace if only limited Premium Licenses are available.

 

Benefits

  1. TOMPo will act as a common platform for both Data Scientists and Data Engineers where they can easily navigate to the required information and take informed decisions.
  2. Understand the design of data model without the need to get explicit access to PROD solution.
  3. Get up-to-date definition of measures, derived columns, etc.
  4. Get business description about entities\columns and measures (if provided in Visual Studio -> Properties)
  5. Understand underlying roles in the models and the details about the members added.
  6. Get data types of the attributes.

 

What More Should You Know?

TOMPo is Not a direct release of Azure Purview or PowerBI products and will not be officially supported. It is a solution developed by the HR Data Insights team within Microsoft’s Digital Employee Experience organization. This lineage solution is provided to enable Microsoft customers and other users with enriched features to understand data behind their reporting dashboards (as given in above use case scenarios). The solution is being used in our programs successfully and is helping the business and development team in various scenarios.

The solution works only on metadata and can help in data governance and compliance.

 

What Does It Take To Onboard?

  • TOMPo can be onboarded even if one Premium license is available, it doesn’t need all the users to be on a Premium workspace.
  • In the Scenario 2 described above, only a copy of the dataset can reside on the Premium Workspace and the regular run business can be carried on in the boundary of Pro workspace.
  • TOMPo will hit the dataset residing in Premium workspace to pull the dataset lineage\metadata and will extract the reporting lineage from the reports residing in Pro workspace.

Onboarding Steps:

Summary

TOMPo is a metadriven framework and onboarding is a very simple process, primarily TOMPo involves configuration of two csv files and placing two notebooks in Synapse and then running the notebooks to generate the metadata\lineage files.  You can configure keyvault name in Apache Spark Configuration.

For detailed onboarding steps, refer to GitHub.

 

Security:

  1. TOMPo dashboard contains the metadata and lineage info for the models and reports, it doesn’t contains the actual data(metric values\data stored in tables etc.)
  2. TOMPo parser securely connects to the AAS Model\Power BI dataset via a SPN to extract the lineage and metadata.
  3. The SPN needs to be added as an admin to the Power BI Workspace where the dataset and reports are hosted
  4. In case of AAS model the SPN needs be added as an admin on the AAS server
  5. The secret for the SPN is not hard coded in the parsers, instead they are securely stored in the key vault.
  6. Access to TOMPo dashboard can be controlled via the Power BI workspace and it can be shared only with required audience.
  7. TOMPo parser output is securely stored in Azure Data Lake and only the authorized people will have access to the metadata\lineage for any further usage Or analysis.

 

TomPo - Integration Into Azure Purview Along With Spark Lineage

 

The

output of TOMPo can be integrated into Azure Purview as well, the view in Azure Purview will be similar to the Impact View page in the TOMPo dashboard. Lineage from TomPo is an enrichment there in terms of showing the data model design, relationships, report pages, visual types, roles and memberships of PowerBI/AAS.

Below is a snippet from Azure Purview where you can see both Spark and Report Lineage:

 

Let’s see how we integrate the output of TOMPo Parser into Azure Purview:

  1. TOMPo Parser generates a delta table called Tompo_reportimpact (this can be seen in the TOMPo onboarding document on Github)
  2. Here is the schema of Tompo_reportimpact
    1. Model Name
    2. Table Name
    3. Object Name
    4. Report Name
    5. Page Name
    6. Visual Type
    7. Object Type
  1. Create Purview Collection. Refer post in Additional Resources below.
  2. Map the atlas core assets to respective collection ID
  3. Using PyApache Atas API we can push the output of delta table in Azure Purview

You can view this short demo of the complete solution including both Spark Lineage - SparkLin and TomPo to get better understanding.

 

 

Additional Resources:

Code Repository:

  • For Synapse Spark Lineage Repo: SparkLin
  • For AAS/PowerBI Lineage Repo: TOMPo

Details of End-to-End and Spark Data Lineage from Synapse: End-to-End Spark Lineage Blog

Post creating a Purview account, Create a collection in Purview Instance

 

Updated Feb 01, 2023
Version 2.0