Welcome back to the fourth episode of my blog series on SAP CDC Connector and SLT! Previously, I've provided an architecture overview to help you understand the solution's foundation, as well as covered the basics of setting up, running, and monitoring the extraction process. Today, we'll explore advanced settings that can help you optimize the performance of your SAP data replication solution. However, please be aware that these tweaks are intended for experienced users and should be thoroughly tested in a development or test environment before implementing them in your production system. By the end of this post, you'll gain valuable insights into fine-tuning your replication process for better efficiency and performance.
ADVANCED TOPICS - PERFORMANCE OF INITIAL EXTRACTION
Initial data extraction occurs in two steps, which can partially overlap. Firstly, the SLT engine loads the content of selected tables into delta queues, from which Azure Data Factory or Synapse extract the data and puts it into the desired target storage. As described in the previous chapter, we can further distinguish three main phases of the SLT processing:
Defining Migration Object is usually very fast, and there is not much we can do to further optimize it. Therefore, we can skip it and focus on the access plan calculation and data loads. When you define the SLT configuration, you can choose from two Initial Load Modes – Resource Optimized and Performance Optimized. This setting identifies the way the system calculates the action plan and, in effect, the way how the source data is chunked into smaller, manageable portions that are easier to process, extract and transfer to the delta queue. The analysis of source data significantly influences the overall performance of the initial load and cannot be overlooked.
In the default, Performance Optimized approach (Sender Queue), the sender system decides how to portion the data. In addition, data is compressed and copied to technical tables even before data load jobs kick in. That results in a higher throughput of copying data to delta queues, but it comes with trade-offs. It requires additional resources in the source system and results in a lengthy access plan calculation.
On the other hand, in the Resource Optimized mode (Range Calculation), the system uses one of key fields to calculate the access plan and divide the source table into portions. Unlike the Performance Optimized mode, there are no extra pre-processing steps. As a result, the access plan calculation runs much faster. However, due to the lack of compression, the throughput is lower.
After completing the Action Plan Calculation, the system initiates the actual data load process. Once the first portion of data is available in the delta queue, the SAP CDC connector begins the extraction process. However, using the Performance Optimized mode results in a delay caused by the time it takes to compress data.
When only SLT engine is used to copy data between systems, without using the SAP CDC connector, the whole process is contained in a single step. In such cases, the SLT essentially determines the solution's performance. Introducing compression as part of the calculation job to increase throughput makes sense. However, when using Azure Data Factory, a more balanced approach is recommended, where the SLT data load throughput matches the throughput of the Azure connector.
The performance of the initial data load to delta queues also depends on the selected architecture pattern and the number of jobs that copy data. Based on my observation, a single data load process in the embedded mode and two in the standalone one provide sustainable end-to-end performance for the SAP CDC connector for a single table. The extra time required to run the access plan calculation in the Performance Optimized mode only delays the actual extraction and the extra throughput is usually not worth it. Therefore I recommend always using the Resource Optimized mode and considering the switch to Performance Optimized only in case of problems.
In addition to what SLT Cockpit offers, further optimization of the data replication process can be done in Advanced Replication Settings (LTRS). In the majority of cases, the default settings should be sufficient, but in certain cases, especially when working with very large tables, fine-tuning the LTRS settings can improve overall performance. Before changing any of the settings, it is important to understand what you are doing and to test the impact in a non-production environment first. Here is a summary of settings worth considering:
Above advanced settings let you tune the initial extraction process and ensure you’re getting the most of your system resources.
ADVANCED TOPICS – REPLICATION SCHEDULE
In the last step of setting up the SLT configuration, you can influence the replication schedule. The default setting replicates changes in real-time, which is useful for ongoing replication but may not be necessary if data is extracted from ODP to Azure only once or twice a day. This default setting can consume system resources unnecessarily. During replication, data is stored in compressed format in technical delta tables, which are updated by the SLT engine using logging tables to identify changes. There are three types of delta tables: ODQDATA_F for one-off full extractions, ODQDATA_C for full load with delta initialization, and ODQDATA for delta changes after initialization. The portion of data processed during full extraction matches a unit in the ODP framework.
During the full extraction, large portions of data are processed while further changes are processed in smaller units. However, depending on the replication schedule and the number of transactions, the ODQDATA table may experience significant growth. This is because in real-time mode, data load jobs continuously process entries in logging tables and store them in ODQDATA table in limited number of records per line.
You can tune this behaviour by changing the replication schedule from real-time to intervals or specific times, which allows the SLT engine to combine multiple changes into a single ODQDATA entry, reducing the growth of the table. A smaller table is always faster and easier to access than a larger one, which can have a significant impact on performance. Additionally, this approach has another interesting benefit.
If a record is changed multiple times, a new entry is created in the logging table, and in real-time mode, it's immediately transferred to the delta queues, which are then extracted by the SAP CDC connector and applied to the target storage. However, if replication is less frequent, the SLT engine can perform deduplication, meaning only the latest version of the record is copied to the delta queue. This can significantly reduce the number of records extracted by Azure and hence the process duration in certain scenarios. For instance, when you make individual changes to line items of a particular sales order, every change also modifies the header information. Therefore, ten updates may result in twenty changes for the VBAK and VBAP tables. However, with deduplication, even if there are ten updates to the header information, the SLT engine processes only the latest version of the record, reducing the total rows to process by almost 50%.
The screenshot shows the data volume of extracted records after regenerating the SBOOK data. The first line represents the replication mode set to Real-Time, while the line below shows the effect of deduplication after switching to ten minute intervals. Please note that process of regenerating data is not something you will commonly face in production environment, so to correctly understand what benefits you will get from changing the replication schedule you should analyse how changes happen to the source table.
The SLT engine offers four options for replication frequency, including Real-Time, Intervals, On-Time, and On-Demand. When choosing the right replication frequency, it's important to consider the SAP CDC extraction schedule. For example, if data is fetched to Azure every hour, the replication could be set to run every thirty minutes to reduce the risk of misalignment and latency. The On-Demand replication requires maintaining extra settings during extraction and is not supported out of the box by ODP engine.
Thank you for joining me in this deep dive into advanced settings to optimize the SAP CDC Connector and SLT performance. I hope you found these insights valuable and you can apply them to achieve a more efficient data replication process. In our next episode, we'll explore additional tips, tricks, and best practices to further enhance your SAP data replication journey. Don't miss out on our upcoming insights - stay tuned and continue to elevate your data extraction capabilities with my blog series.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.