Of Kings & Queens and SAP scans to identify the right lineage for a marriage proposal
Published Apr 04 2022 10:43 AM 2,324 Views
Microsoft

Origin and lineage are fundamental properties being used to balance trustworthiness, value, and quality for all sorts of things in life. In addition to that, it requires credible parties confirming those properties and traceability to prove a claimed lineage. The ancient monarchs didn’t want to get engaged with impostors resulting in damage to their reputation and heritage. The same is true for consumer goods, journalism, scientific publications – or data presented to you to gain insights and make decisions.

 

Relax, you won’t be picking a partner for life today ;) but we will have a closer look at gaining a deeper understanding into your data estate and its governance.

 

For many companies, SAP ERP is at the heart of daily business operations and therefore, there is a desire by many stakeholders to leverage SAP data for insights. Data often gets curated in shiny slide decks and interactive dashboards, combining multiple data sources, pivot tables, and excel sheets. 

 

How do you really know where the data came from, what it actually contains, how it was processed and where it is being used? 

 

 

Start your journey to know these answers with Microsoft Purview, a unified governance solution that supports data classification, metadata discovery, and lineage for your entire data estate across on-prem, multiple clouds, and SaaS applications.

 

Data maps give a good overview

I registered two S/4HANA systems, one SAP Business Warehouse on Hana, Azure Data Lake Storage, a Power BI tenant and a Salesforce CRM. For scalability, Components can be grouped with collections (e.g., SAP-S4-FS1) to address rules at the collection level rather than for individual components.

 

 

Data map with SAP and non-SAP estateData map with SAP and non-SAP estate

Fig.1 Overview of data map in Microsoft Purview

 

The SAP systems itself can be on-premises, on Azure or any other hyperscaler. Connectivity to Microsoft Purview for the scanning activities is established through the Self-hosted Integration Runtime. See our Microsoft Purview YouTube Channel for a guided video to setup a scan:

 

 

Find more details on the prerequisites for SAP scans on the Microsoft Purview docs.

 

Microsoft Purview supports SAP ECC, S/4HANA, Business Warehouse (BW) and HANA database as sources, with BW and HANA db in public preview. The integration concept is the same for all supported SAP sources.

 

Let’s deep-dive on the setup for S/4HANA

The Microsoft Purview Connector requires an RFC connection to SAP and a dedicated ABAP program (remote function module) that collects the information. The result gets dumped for Microsoft Purview to collect. At this point, the ABAP program is shipped as custom code that needs to be created in the customer’s own SAP name space.

 

Parts of the ABAP code can be commented to avoid scanning components that are not desired. One example could be the ABAP development classes. If you don’t require insights into data objects that are “touched” by certain internal development classes, you can skip them. This frees up your Microsoft Purview designer view, and speeds up processing and overall resource consumption.

 

Hint: use “CTRL + ,” for mass commenting on ABAP (transaction code SE80).

 

However, you need to keep the mandatory metadata structures that are flagged as required. Otherwise your scan will not be rendered properly.

 

Screenshot of ABAP function for metadata extractionScreenshot of ABAP function for metadata extraction

Fig.2 Screenshot of ABAP Editor and Microsoft Purview Scanning function module

 

With that you are good to proceed with the scan. Navigate to your Microsoft Purview resource and hit “New Scan”. Maintain a name for the scan, add your registered Self-hosted Integration Runtime (SHIR), required credentials for the SAP RFC call on Azure KeyVault, your SAP target client, and the OS path to the SAP Java Connector library (JCo) on the VM running the SHIR. Finally, supply enough memory to your scanning operation on the VM running the SHIR. Current recommendation mentioned on the docs are at least 128 GB RAM.

 

New scan setup screen for SAP S/4HanaNew scan setup screen for SAP S/4Hana

Fig.3 How to create a new scan from overview pane

 

Sit back, relax, and wait for the scan and ingestion process to finish. The scan itself is rather quick. Data analysis and ingestion may take longer. See below an example of a full scan from our sandbox landscape.

 

Azure Purview SAP scan result exampleAzure Purview SAP scan result example

Fig.4 Run details from S/4HANA full scan

 

Once finished, you can browse all your SAP assets from the Microsoft Purview Data Catalog or the Search bar. See below an example of the famous SAP S4 finance table ACDOCA.

 

Azure Purview SAP data catalogAzure Purview SAP data catalog

Fig.5 Browsing SAP Asset ACDOCA on Microsoft Purview Data Catalog

 

The asset reflects the SAP hierarchies from the backend and the established collection path on Microsoft Purview. Great, but where is the data being used?

 

I have a simple data processing pipeline on Azure Data Factory that extracts the finance data from SAP S/4HANA and saves it as CSV files on our Azure Data Lake. On top of that, run a Power BI dataset and respective dashboard.

 

Our Azure Data Factory interactions create lineage on Microsoft Purview. See the Azure docs for more details on the configuration process.

 

A scan of our Power BI tenant lights up the new data set and dashboard built from it. How do you link the whole flow now? Data Factory wasn’t involved in creating the Power BI assets, hence couldn’t create any lineage for it. 

 

Microsoft Purview REST API to the rescue!

The API uses the Apache Atlas Open API ecosystem as a base and is therefore easily applicable. Have a look at this blog by our colleague Piethein for additional context.

 

Find the Python code on this GitHub repos by Franck Mercier to complete the lineage. Simply execute the code from your Python environment or mimic the orchestrated REST API calls via a REST client like Postman.

 

As a result, you get a beautiful view of your finance data reporting and its origin. 

 

Azure Purview SAP end-to-end lineage exampleAzure Purview SAP end-to-end lineage example

Fig.6 End-to-end metadata view on Microsoft Purview for SAP finance data from ACDOCA

Further Reading

Final Words

Ready to get married yet? We have increased the credibility of our Power BI dashboard by establishing a sound metadata flow with transparent lineage.

 

Version history
Last update:
‎Sep 21 2022 03:26 PM
Updated by: