Faster Data Copy between Source and target for partitioned table using Partition Switch in ADF

Ramanath_Nayak

Microsoft

Dec 04, 2025

This blog talks about an ADF based solution which can help in migrating partitioned table in a faster manner.

Introduction

This blog post presents a comprehensive Azure Data Factory (ADF) solution for automating the migration of partitioned tables from IBM Db2 z/OS to Azure SQL Database. The solution consists of two main components: a Partition Discovery & Preparation Pipeline and a Parallel Copy Pipeline with partition switching capabilities. This approach significantly reduces migration time and ensures data integrity while maintaining partition structure in the target environment. As we use one separate table per partition for data copy, it divides the workload into multiple different tables hence reducing the complexity in data migration. For few partitioned/non-partitioned tables, there is another approach where parallel copy activities can be implemented.

Challenges in migrating of large, partitioned tables

Migrating large, partitioned tables from IBM Db2 z/OS to Azure SQL Database presents unique challenges:

Complex Partition Structures: Db2 z/OS supports various partitioning schemes (range, hash, list) that need to be properly mapped
Large Data Volumes: Enterprise tables can contain billions of rows across hundreds of partitions
Minimal Downtime Requirements: Business-critical applications require near-zero downtime migrations
Data Integrity: Ensuring consistency across all partitions during migration
Performance Optimization: Maximizing throughput while managing resource consumption

This approach addresses most of the challenges above in an intelligent, automated approach:

Automatically discovers partition metadata from Db2 system tables
Creates optimized migration plans based on partition characteristics
Create one temporary persistent table per partition(Which can be dropped after the process is complete) with the same partition function and scheme to help in migration(Check constraint is another way to implement this, however for the sake of simplicity, we are cloning the base table)
Executes parallel data transfer with partition-level granularity
Implements partition switching for minimal downtime
Provides comprehensive monitoring and error handling

Solution Overview

The solution has two phases both implemented using an automated ADF Pipeline.

Phase 1: Discovery & Preparation

Using a Db2 copy activity, extract necessary data from Db2 System tables.
Extracts partition metadata (boundaries, row counts, sizes)
Creates migration control tables in Azure SQL Database
Generates optimized copy strategies per partition

Phase 2: Copy and Partition switch

Executes partition-level data transfers in parallel
Implements partition switching for seamless integration
Provides real-time status tracking and error handling
Supports restart capabilities for failed partitions

Prerequisites

Azure data factory instance
Self-Hosted Integration Runtime (SHIR) with Db2 connectivity
Access to System tables on Db2.
Schema of Db2 Source Partitioned table should be migrated to SQL(SQL table will be used to clone temporary persistent tables per partition)

ADF Pipeline Phase 1: Partition Discovery & Preparation Pipeline

Fig 1.1 Partition Preparation Pipeline

The diagram above illustrates the initial phase of the solution, which sets up the necessary tables and extracts essential information from Db2 into Azure SQL. Here’s a breakdown of each step:

The first script establishes a control table in SQL called Db2_PARTITION_STATUS.
The second script adds indexes to Db2_PARTITION_STATUS to improve access speed.
The third step uses copy activities to retrieve partition details from Db2 system tables and populate the Db2_PARTITION_STATUS table.
The fourth step creates a stored procedure named ClonePartitionedTable, which automates the cloning of SQL tables—one for each partition—making it easier to handle hundreds of partitions.
The fifth step fetches relevant rows from Db2_PARTITION_STATUS. Users can decide which tables to migrate by updating the MIGRATE column in this table.
The final step generates a clone of the source table for each partition, supporting the overall migration process.

ADF Pipeline Phase 2: Data Copy & Partition Switch

Fig 1.2 Parallel copy and Switch pipeline

The next phase involves the actual data copy between partitions

First step gets the table and partition information for the rows which need to be migrated using a lookup activity in ADF.
For each row that is retrieved from step1 above, there are a series of activities which are executed in parallel. Details of it are provided below.

Fig 1.3 ForEach Activity Details

As there are multiple rows which come into the foreach activity, each instance of foreach represents on partition of the source table. So the activities highlighted below will be performed for each and every partition in parallel that has been marked for migration.

First we update the Status table that the data copy has started for this particular partition using a script activity.
Next is a copy activity, which copies the particular partition data information from source to the same partition on a cloned SQL table.
If the Copy activity was successfully, the status table is updated with a status as Start Switching.
If the copy activity fails, it updates the status as Copy Failed. This helps in tracking which partition data was copied and which failed. It also helps in restarting copy of failed partitions.
Then there is a script activity which actually does the partition switch using an Alter table command.
Again the status of the partition switch is updated as success/Failure in the status table, helping in exactly tracking which partitions were successful and which were not.

Conclusion

This comprehensive Azure Data Factory solution provides a robust, scalable approach to migrating partitioned tables from IBM Db2 z/OS to Azure SQL Database. However it is not restricted to Db2 as source. The same architecture can be used for other partitioned sources with slight modifications. The two-pipeline architecture ensures:

Key Benefits Achieved

Automated Discovery: Eliminates manual partition mapping and reduces human error
Intelligent Optimization: Applies data-driven strategies for optimal performance
Parallel Processing: Maximizes throughput through partition-level parallelism
Minimal Downtime: Uses partition switching for near-instantaneous data integration
Comprehensive Monitoring: Provides real-time visibility into migration progress
Error Resilience: Isolates failures and enables selective retry operations

Business Impact

Reduced Migration Time: From weeks to days through automation and parallelization
Lower Risk: Comprehensive testing and rollback capabilities
Cost Efficiency: Optimal resource utilization and reduced manual effort
Maintainability: Standardized approach applicable across multiple migration projects

Future Enhancements

The solution foundation supports several potential enhancements:

Machine Learning Integration: Predictive optimization of batch sizes and thread counts
Advanced Monitoring: Integration with Azure Monitor and custom dashboards
Cross-Platform Support: Extension to other database platforms beyond Db2
Automated Testing: Built-in data validation and integrity checking
Cloud-Native Optimization: Leverage Azure SQL Database elastic pools and serverless compute

This solution represents a best-practice approach to enterprise data migration, combining the power of Azure Data Factory with intelligent partition management strategies. Organizations implementing this solution can expect significant reductions in migration time, risk, and cost while achieving reliable, repeatable results across their data migration initiatives.