How a retail customer unleashed the power of SAP data and improved data ingestion with ADF data extraction in Azure.
A US-based multinational manufacturing and retail company had its legacy systems, aged infrastructure, and traditional data extraction tools running on-premises. With this current infrastructure, they were unable to perform real time analytics to derive meaningful business insights. They grappled with challenges around reducing operational cost and using quality data-driven insights in real-time to increase revenue.
The company decided to take benefits of the Azure platform services like Azure Data Factory (ADF) to overcome their current challenges, while their SAP real estate remain on on-premises. They also wanted to leverage modern technologies like Azure Machine Learning to predict the sales pattern and customer requirements to increase their revenue.
Read this article to learn how Cognizant teamed up with Microsoft to help this multinational manufacturing and retail company unleashed the power of SAP data and improved data ingestion with ADF data extraction in Azure .
This blog is co-authored by Madhan Kumar Munipandian, Solution Architect – Data Analytics and Rajib Mandal, Data Engineering - Lead, Cognizant; Prabhjot Kaur, Senior CSA, Microsoft.
The customer’s legacy data warehouse was designed a decade ago, resulting in a very slow and inefficient data extraction process.
The following challenges were presented by the customer to the team:
Speed and accuracy are the basic requirements for many customers performing ETL (Extract, Transform, and Load) operation. This retail customer had the following requirements:
To modernize the business, to meet the requirements, and to take advantages of analytics and integration platform services, the company decided to migrate their on-premises enterprise ware house workload to Azure.
One of the important requirement is to seamlessly integrate SAP (running on on-premises) and non-SAP systems and extract the data from various sources, which would then be analyzed by business decision makers to make key decisions. To fulfill this requirement, the company decided to embark on a digital transformation initiative to build Data Lake on the cloud and leverage Azure data analytics services. This will help them to resolve challenges faced in legacy data warehouse solution with on-premises.
Here are the primary business drivers for the solution(s):
Solution must also consider the following technical requirements :
Pilot phase was executed to compare different Integration Platform as a Service (iPaaS) solutions to evaluate SAP data extraction and design pattern’s to handle different avenues of data scenario’s such as source data size, delta extraction, etc. Based on Pilot results, performance benchmark and primary drivers tool selection criteria, ADF was selected as the SAP data extraction tool for full batch and mini batches. After a lot of analysis, ADF emerged as the ETL engine that could fulfill the business requirements, and overcome the current technical challenges.
The big question that still needed to be addressed was: “How to efficiently leverage Azure Data Factory to extract data from different SAP Applications, SAP - ECC (ERP Central Component) , SAP – APO (Advance planning) and SAP – BW (Business Warehouse) ?”
ADF was deployed with the following connectors and configuration. However, to meet the business SLA, we had to make some configuration changes (documented below):
Just with this small change, process was 48 times faster, and throughput was 35 times higher
Additionally, the oDATA connector was evaluated. However, it was not deployed to the production as it didn’t meet customer’s requirement.
The following diagram illustrates the reference architecture of the deployed solution:
Here the various component and their roles in the architecture:
ADF is a great solution to extract data from various source systems (SAP and non-SAP), perform transformation, and load to the target data sink.
We performed extensive analysis, comparison, and performance tests, and here are some recommendations (based on our learning) on when to use which ADF connectors for efficient data extraction from SAP applications:
Here are some learnings/best practices to consider while using the various data factory connectors.
SAP Application |
Data Factory Connector |
Key features and key considerations |
SAP ECC, SAP SCM/APO, SAP BW |
SAP Table |
• Data extraction through SAP ECC Application layer using ADF SAP Table Connector • Support full and Delta extraction • Maximum runtime configuration to be considered when defining data extraction pattern • Support low to high data size • Use BASXML protocol |
SAP ECC, SAP SCM/APO, SAP BW |
Native DB connector |
• Data extraction directly from Database objects • Primary database access is restricted in SAP Transaction applications (SAP ECC & APO), so leverage. Secondary database/High Availability instance for data extraction • Support full and Delta extraction • Frequency of replication between Primary to Secondary instance to be considered |
SAP – ECC |
SAP OData |
• Recommended for only Small data volume • Entities exposed by SAP ECC OData services • OData Services internally creates objects in SAP ECC for each ODATA services. • Recommended for very low volume (a few thousands records) |
SAP – BW |
SAP Open Hub |
• Data need to be moved from SAP ECC to SAP BW using (BW Business content or customer extractor) • Data extraction through SAP BW Application layer using OpenHub Connector • SAP BW will become intermediate data storage layer with transformed and Aggregated data • Support full and Delta extraction • Open Hub Destination Object need to be built in SAP BW • Support for low to high data size • Use BASXML protocol |
SAP – BW on HANA |
SAP HANA |
• Only when data stored in SAP HANA • Support full and Delta extraction • Support for low to high data size |
With the existing tools and services offered in Azure, you can rapidly deploy, build, and configure your solutions. These solutions were successfully deployed to the customer site, and many more customers are in pipeline to get this solution implemented.
You can integrate data silos with Azure Data Factory. Easily construct ETL and ELT processes code-free or write your own code. Visually integrate data sources using more than 90+ natively built and maintenance-free connectors at no added cost. Focus on your data—the serverless integration service does the rest. Read “Azure Data Factory - Hybrid data integration service that simplifies ETL at scale” for more information.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.