Introduction
In today’s data-driven landscape, the migration of databases is a crucial task that requires meticulous planning and execution. Our recent project for migrating data from Sybase ASE to MSSQL set out to illuminate this process, using tools like Microsoft SQL Server Migration Assistant (SSMA), Microsoft SQL Server Integration Service (SSIS) packages, and Azure Data Factory (ADF).
We carefully designed our tests to cover a range of database sizes: 1GB, 10GB, 100GB, and 500GB. Each size brought its own set of challenges and valuable insights, guiding our migration strategies for a smooth transition.
For the smaller databases (1GB and 10GB), we utilized SSMA, which demonstrated strong efficiency and reliability in handling straightforward migrations. SSMA was particularly effective in converting database schemas and moving data with minimal complication. As we scaled up larger datasets (100GB and 500GB), we incorporated SSIS packages alongside Azure Data Factory to address the increased complexity and ensure robust performance throughout the migration process.
Our exercises yielded important findings related to performance metrics such as data throughput, error rates during transfer, and overall execution times for each migration approach. These insights helped us refine our methodologies and underscored the necessity of selecting the right tools for each migration scenario when transitioning from Sybase to MSSQL.
Ultimately, our experience highlighted that thorough analysis is essential for identifying potential bottlenecks and optimizing workflows, enabling successful and efficient migrations across databases of all sizes. The results reassure stakeholders that, with a well-considered approach and comprehensive testing, migrations can be executed seamlessly while maintaining the integrity of all datasets involved.
There are 2 parts in the Strategy, the 1st part covers the overview, source database environment setup for Sybase, target database environment setup for Azure SQL, migration steps, time consumed and key learnings for SSMA and SSIS, the remaining sections are covered in Part 2 here :- Part 2
Objectives of the Evaluation
- Assess the migration duration for various database sizes.
- Evaluate the capabilities and limitations of each tool.
- Identify optimal patterns for bulk data movement.
- Recommend the best tool or parameter combination for various scenarios which gives the best results for different sample datasets.
Overview of Tools
SQL Server Migration Assistant (SSMA) for SAP ASE
Microsoft SQL Server Migration Assistant (SSMA) for Sybase Adaptive Server Enterprise (ASE) is a tool for migrating SAP ASE databases to SQL Server 2012 (11.x) through SQL Server 2022 (16.x) on Windows and Linux, Azure SQL Database or Azure SQL Managed Instance. It supports schema conversion, data migration, and limited post-migration testing. SSMA converts Sybase ASE database objects (tables, views, stored procedures, etc.) to Azure SQL-compatible formats and migrates data using a client-side or server-side engine.
SQL Server Integration Services (SSIS)
SQL Server Integration Services is a platform for building enterprise-level data integration and data transformations solutions. It offers flexibility in handling non-compatible objects, custom transformations, and large-scale data transfers through its pipeline architecture. SSIS is particularly useful when SSMA cannot migrate certain objects or when additional data transformation is required like Unparsed SQL, SET option conversion issues, Identifier conversion and issues, date format conversion and with NON-ANSI joins
Azure Data Factory (ADF)
ADF is a fully managed, serverless, cloud-based data integration service for orchestrating and automating data movement and transformation across on-premises and cloud environments. It is well-suited for hybrid migrations, large-scale data pipelines, and integration with Azure services like Azure SQL. ADF excels in scenarios requiring scalability and parallel processing.
Environment Setup
For testing, we used the following setup:
Source Environment
- Sybase ASE Version: 16.0 SP03
- OS: SUSE Linux Enterprise Server 15 SP6
- VM Size: Standard B12ms (12 vcpus, 48 GiB memory)
Target Environment
- SQL Server 2022: hosted on Azure VM
- OS: Windows Server 2022
- VM Size: Standard D16as v4 (16 vcpus, 64 GiB memory)
Network
- Both VMs hosted in same Azure region
- Premium SSD LRS disks for source and target
Metrics
We evaluated the tools based on:
- Data Migration Time: Time to migrate 1 GB, 10 GB, 100 GB (1 x 100 GB table) and 500 GB (5 x 100 GB tables) of data.
- Scalability: Performance with increased data volumes (up to 500 GB).
SQL Server Migration Assistant (SSMA)
Migration steps:
- Install SSMA for SAP ASE 10.3 or above and required drivers (Sybase ASE ODBC/ADO.NET providers).
- Create an SSMA project, configure source (Sybase ASE) and target (Azure SQL) connections.
- Assess database compatibility, customize data type mappings (e.g., Sybase TEXT to SQL Server NVARCHAR(MAX)), and convert schema.
- Migrate data using SSMA’s client-side or server-side engine.
- Validate migrated objects and data.
Test Results:
|
Test Scenario |
Data Size |
Time Taken |
Throughput |
Threads |
Notes |
|
Single Copy |
1 GB |
28 seconds |
~36 Mbps |
1 |
Single Threaded |
|
Single Copy |
10 GB |
4 minutes 36 seconds |
~37 Mbps |
1 |
Single Threaded |
|
Parallel Copy |
100 GB |
44 minutes |
~38 Mbps |
40 |
Parallel Threads - Using project level setting changes, like Migration engine, batch size, parallel data migration mode and multi loading |
|
Scalability Test |
500 GB |
3 hours 44 minutes |
~38 Mbps |
40 |
Parallel Threads - Same performance as of sequential nature of SSMA’s processing engine, which limits parallelism |
Key learnings:
- Excellent for schema conversion and small data sets
- Not scalable beyond 100 GB, memory issues and slow single-threaded loads
- Limited configuration for tuning bulk inserts
Performance Tuning Insights:
Migration Engine: Configured server-side data migration to optimize performance and reduce processing overhead. Client-side data migration refers to SSMA client retrieving the data from the source and bulk inserting that data into Azure SQL. Server-side data migration refers to SSMA data migration engine (bulk copy program) running on the Azure SQL box as a SQL Agent job retrieving data from the source and inserting directly into Azure SQL thus avoiding an extra client-hop (better performance). When choosing this method, you will need to specify which version of the BCP is intended to use (32bit or 64bit):
Batch Size: Data is migrated in batches from the source tables into Azure SQL tables within transactions. The batch size option determines how many rows are loaded into Azure SQL per transaction. By default, it is 10,000, but we increased it to 270,000 since our dataset contained 26,214,400 rows (around 100 GB).
Parallel Data Migration Mode: This option is available only when using the Client Side Data Migration Engine mode. It defines the number of parallel threads to be used during migration. By default, it is set to Auto (10 threads). To modify it, select Custom and specify the desired number of parallel threads. We changed this to 40 to get the best results.
Multi loading: With Multi-Loading enabled, SSMA uses multiple parallel threads to load data batches at the same time, which can significantly speed up migration for large tables. It essentially breaks the source data into chunks (based on your batch size setting) and loads them concurrently.
SQL Server Integration Services (SSIS)
Migration steps:
- Create an SSIS project in SQL Server Data Tools (SSDT).
- Configure ODBC / ADO .NET Source (Sybase ASE) and ADO .NET/OLE DB Destination (Azure SQL) with appropriate drivers.
- Build Control flow and data flow tasks for each table, applying transformations for incompatible data types or business logic.
- Execute packages in parallel for large tables, optimizing buffer size and commit intervals.
- Monitor and log errors for troubleshooting.
Test Results:
|
Test Scenario |
Data Size |
Time Taken |
Throughput |
Threads |
Notes |
|
Single Copy |
1 GB |
31 seconds |
~33 Mbps |
1 |
Single Threaded |
|
Single Copy |
10 GB |
4 minutes 16 seconds |
~40 Mbps |
1 |
Single Threaded |
|
Parallel Copy |
100 GB |
38 minutes |
~44 Mbps |
5 |
Parallel Threads – Regulating MaxConcurrentExecutable, DefaultBufferMaxRows, Engine threads and Azure SQL maximum server memory |
|
Scalability Test |
500 GB |
3 hours 12 minutes |
~44 Mbps |
5 |
Parallel Threads - Performance improved with larger buffer sizes (100 MB) and SSD I/O. |
Key learnings:
- Very fast for large data volumes with tuning
- Requires development time (package design, error handling)
- Bottleneck: network throughput on 500 GB run
- Best suited for 10–500 GB on-prem or IaaS migrations
Performance Tuning Insights:
DefaultBufferSize: The task’s buffer settings can be configured using the DefaultBufferSize property, which defines the buffer size, and the DefaultBufferMaxRows property, which specifies the maximum number of rows per buffer. By default, the buffer size is 10 MB (with a maximum of 100 MB), and the default maximum number of rows is 10,000.
DefaultBufferMaxRows: The data flow engine begins the task of sizing its buffers by calculating the estimated size of a single row of data. It then multiplies the estimated size of a row by the value of DefaultBufferMaxRows to obtain a preliminary value for the buffer size.
- If the result is greater than the value of DefaultBufferSize, the engine reduces the number of rows.
- If the result is less than the minimum buffer size calculated inside the engine increases the number of rows.
- If the result falls between the minimum buffer size and the value of DefaultBufferSize, the engine sizes the buffer as close as possible to the estimated row size times the value of DefaultBufferMaxRows.
If sufficient memory is available, it is better to use fewer large buffers instead of many small ones. In other words, performance improves when the number of buffers is minimized, and each buffer holds as many rows as possible.
IsSorted: Sorting using a SSIS task is a slow operation by definition. Avoiding unnecessary sorting can enhance the performance of data flow in the package. Set the IsSorted property of a component in the output data flow upstream to True.
ADO .NET Source: When retrieving data from a view using an OLE DB data source, choose SQL command as the data access mode and provide a SELECT statement. Using a SELECT statement ensures that the view is accessed in the most efficient way.
OLEDB Destination: Several OLE DB Destination settings can significantly impact data transfer performance:
- Data Access Mode – This setting offers the Fast Load option, which internally uses a BULK INSERT statement to load data into the destination table, instead of executing individual INSERT statements for each row. Unless you have a specific reason to change it, keep the default Fast Load option enabled. When using Fast Load, additional performance-related settings become available (listed below).
- Keep Identity – By default, this is unchecked. If the destination table has an identity column, SQL Server generates identity values automatically. Checking this option ensures that identity values from the source are preserved and inserted into the destination.
- Keep Nulls – By default, this is unchecked. If a NULL value is encountered in the source and the target column has a default constraint, the default value will be inserted. Enabling this option preserves the NULL values from the source instead of applying the default constraint.
- Table Lock – This is checked by default, meaning a table-level lock is acquired during data load instead of multiple row-level locks. This prevents lock escalation issues and generally improves performance. Keep this enabled unless the table is actively being used by other processes at the same time.
- Check Constraints – Checked by default. This validates incoming data against the destination table’s constraints. If you are confident the data will not violate constraints, unchecking this option can improve performance by skipping the validation step.
- Rows per Batch – The default value is -1, which means all incoming rows are treated as a single batch. You can change this to a positive integer to divide the incoming rows into multiple batches, where the value specifies the maximum number of rows per batch.
- Maximum Insert Commit Size – The default value is 2147483647 (the maximum for a 4-byte integer), which commits all rows in a single transaction once the load completes successfully. You can set this to a positive integer to commit data in smaller chunks. While committing more frequently does add overhead to the data flow engine, it helps reduce pressure on the transaction log and tempdb, preventing excessive growth during high-volume data loads.
DelayValidation: SSIS uses validation to determine if the package could fail at runtime. SSIS uses two types of validation. First is package validation (early validation) which validates the package and all its components before starting the execution of the package. Second SSIS uses component validation (late validation), which validates the components of the package once started. If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution.
MaxConcurrentExecutables: It's the property of the SSIS package and specifies the number of executables (different tasks inside the package) that can run in parallel within a package or in other words, the number of threads SSIS runtime engine can create to execute the executables of the package in parallel.
EngineThreads: This property specifies the number of source threads (does data pull from source) and worker thread (does transformation and upload into the destination) that can be created by data flow pipeline engine to manage the flow of data and data transformation inside a data flow task, it means if the EngineThreads has value 5 then up to 5 source threads and also up to 5 worker threads can be created. Please note, this property is just a suggestion to the data flow pipeline engine, the pipeline engine may create less or more threads if required.
IsolationLevel: In SQL Server Integration Services (SSIS), IsolationLevel is a property that defines how a transaction within a package interacts with other concurrent transactions in the database. It determines the degree to which one transaction is isolated from the effects of other transactions. Default is ‘Serializable’ which Locks the entire data set being read and keeps the lock until the transaction completes. Instead set it to ‘ReadUncommited’ or ‘ReadCommited’. ‘ReadUncommited’ Reads data without waiting for other transactions to finish. Can read rows that are not yet committed (a.k.a. dirty reads). While ‘ReadCommited’ only reads committed rows. Prevents dirty reads by waiting until other transactions commit or roll back. Use ReadUncommited when you need speed and can tolerate inaccurate data else use ReadCommited but it is slower than Read Uncommitted because it must wait for locks to release.
TransactionOption: The TransactionOption property in SQL Server Integration Services (SSIS) is used to control how tasks, containers, or the entire package participate in transactions to ensure data integrity. It determines whether a task or container starts a transaction, joins an existing one, or does not participate in any transaction. The property is available at the package level, container level (e.g., For Loop, Foreach Loop, Sequence), and for individual Control Flow tasks (e.g., Execute SQL Task, Data Flow Task). Set this to ‘NotSupported’ by which the task or container does not participate in any transaction, even if a parent container or package has started one. If a transaction exists at a higher level (e.g., package or parent container), the task or container operates outside of it. It is the fastest because there’s no transaction overhead.
Database Recovery Model: The source database is configured to use the Bulk-logged recovery model, which aims to minimally log bulk operations. This should be significantly more performant than the Full recovery model, assuming the bulk insert meets the criteria required to be minimally logged. The criteria for the target table are as follows –
- The target table is not being replicated.
- Table locking is specified (using TABLOCK).
- If the table has no indexes, data pages are minimally logged.
- If the table does not have a clustered index but has one or more non-clustered indexes, data pages are always minimally logged. How index pages are logged, however, depends on whether the table is empty –
- If the table is empty, index pages are minimally logged.
- If table is non-empty, index pages are fully logged.
- If the table has a clustered index and is empty, both data and index pages are minimally logged. In contrast, if a table has a clustered index and is non-empty, data pages and index pages are both fully logged regardless of the recovery model.
Pulling High Volumes of Data: To enhance the speed and efficiency of the data migration process, we devised an optimized approach that involves transforming the target table into a heap by dropping all its indexes at the outset, thereby eliminating the overhead of index maintenance during data insertion. Following this, we transfer the data to the heap table, which is significantly faster due to the absence of indexes. Once the data transfer is complete, we recreate the indexes on the target table to restore its original structure and optimize query performance. This streamlined method of dropping indexes, transferring data to a heap, and then recreating indexes substantially accelerates the overall migration process compared to maintaining indexes throughout.
Feedback and suggestions
We hope this post has helped you configure your migration solution and choose the right options to successfully migrate your databases. The remaining steps are covered in Part 2 here :- Part 2
If you have feedback or suggestions for improving this data migration asset, please contact the Databases SQL Customer Success Engineering (Ninja) Team (datasqlninja@microsoft.com). Thanks for your support!
Note: For additional information about migrating various source databases to Azure, see the Azure Database Migration Guide.