According to SAP, 77% of the world’s transaction revenue touches an SAP system. To understand these systems and extracting the data is a challenging job. To gain insights, it is always helpful to integrate the data source systems seamlessly with analytics services.
During a hack session, we tried to accomplish the challenge of extracting the data from SAP source and integrating it with analytical services using the new Database Templates/Lake Database feature that is available in Synapse workspaces.
The SAP source system was SAP S/4HANA
Analytics services was Azure Synapse Analytics and Lake database templates
The scope for our Hack
Extract data from the SAP system (using Synapse Pipelines, with SAP connector based on your SAP source system for e.g. SAP ECC, SAP BW or HANA DB etc.). Then land the data in a storage account (we used the same storage account associated with the Synapse workspace).
Identify the industry template to map the data based on source data (we chose Manufacturing)
Using the above template, design the lake database schema by adding/editing entities and attributes to suit our requirements.
Map the data between source and destination using the map data tool (most of the time was spent understanding the source entities and mapping it to entities in lake database)
The above mapping process will eventually end up in pipelines and dataflows being created. The pipelines will then run and ingest the data in the lake database.
After successful data ingestion use the data in lake database in SQL pools or Spark pool in synapse for exploring the data. It also can be exposed to advanced analytics capabilities such as Machine Learning or use the ingested data for reporting purposes, integrating it with Power BI.
Steps performed for achieving the goal
We integrated SAP S/4 HANA system data with Synapse Lake database templates.
SAP source system details and login
1.Extract data from SAP source system
The first step is to create a linked service to the SAP source system.
To establish a connection, we set up a self-hosted integration runtime as a compute engine and a gateway for data connection.
We used SAP Table connector to create linked service for our source system (the latest CDC connector was still not available within Synapse for some regions).
Provide the client id and login details to establish a connection to SAP source system.
Next, created the datasets for the source and destination.
A simple copy activity in a pipeline to extract the data from source and land in the destination data lake
Source and sink settings
Once the data lands in ADLS, it should look similar to the below:
(The depth of the table folders should be: <container>/<foldername>/<inputfiles> this is important for the data mapping step later)
2.Create an empty Lake database for our source data
We created our lake database using the Manufacturing industry template. It is important to identify the industry template relevant for the data.
We then added the entities to the database canvas as per our needs and created the database definition.
As needed, edit the table definitions with the column names, data types, nullability, relationships etc.
No data is ingested yet, so the database will be empty.
We then published to save the changes. Now, we have an empty lake database based off the manufacturing template with the customizations that we needed.
Note that the database will show under the "Lake Database" in Synapse but yet the corresponding folders in ADLS would still not be created. These folders will be created while the data gets ingested which brings us to the next step of ingesting data into the Lake Database that is continued in second part of this blog below.