Since the general availability of Large Storage (16TB) option in Azure Database for MySQL, large databases in dimension of hundreds of gigabytes to multiple terabytes (up to 16TB), and workload requires high concurrency IO such as OLTP workloads (up to 20K IOPs at the time of writing this) is now supported. This raises an important question:
"What is the fastest way of migrating your large databases, either from on-premises or other cloud environments to Azure?"
Based on our experience working with a customer to migrate their large databases to Azure, we evaluated all the available options and came up with the following best practices. This blog post is intended to share some of these best practices on migrating large databases to Azure Database for MySQL to help other customers.
Customer Scenario
We have a customer has already been running their mission critical OLTP workload on MySQL with database sized at 1.7TB (~600GB of data plus indexes). The database is currently running on legacy storage and customer is seeking to migrate to Azure Database for MySQL service with large storage to take the advantage of support of 16TB storage and 20,000 IOPS.
Our Considerations
The following factors need to be considered on course of planning the large MySQL database migration:
- There is no physical backup facility available for Azure Database for MySQL as it is a fully managed service. Only logical backups, i.e. converting data back to INSERT statements, is supported
- To take advantage of file-per-table tablespace, data must be exported from current database, and import to the new Large Storage based database
- Tools which exploit parallelism performs better during both export and import process as it better utilize the IO bandwidth available. Unfortunately, the mysqldump tool shipped with MySQL package does not support this, however we have community tools like mydumper which supports parallel logical backups (export) and restore (import).
Approach
Given the considerations above, we have devised the following migration strategy:
- Set up an Azure VM for temporary storage of backups and to install backup/restore client.
- Install mysql client tools and mydumper in the Azure VM (we assume our readers are using Ubuntu Linux here).
sudo apt-get install mysql-client
wget https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper_0.9.5-2.xenial_amd64.deb -i mydumper_0.9.5-2.xenial_amd64.deb
- If the source database is used for mission critical application, provision a read replica of the source from which the logical backup using mydumper will be sourced to avoid any interference of the logical backup on the production workload.
- Export the schema using the MySQL shipped tools mysqldump.
- Export data from source MySQL server to the temporary VM using mydumper, which support multi-threaded data export.
- Provision Azure Database for MySQL server with the right configurations and compute/storage settings, which will form the target destination to restore.
- Import data from the temporary VM into the new instance of Azure MySQL using myloader, which also support multi-threaded data import.
Prepare for data migration
Create a temporary VM
Any General-Purpose family Linux VM would serve the purpose. Ubuntu should benefit customer since it is the VM with lowest cost. The VM should have at least 4 vCore and 16GB of memory, with support and enablement of Accelerated Networking to provide a stable network.
A Premium Disk should be added to the VM to store the exported data. Prefer to be sized at least 1TB or above (P30 tier or higher) to leverage the 5000+ IOPS and 200+MB/second throughput per disk[1].
The temporary VM should also be in the same VNet as the source and target MySQL server to ensure connectivity.
Create and setup Azure Database for MySQL
Create a new instance of Azure MySQL. It is recommended to apply the following settings before the migration:
- max_allowed_packet – set to 1073741824 (i.e. 1GB) to prevent any overflow issue due to long rows
- slow_query_log – set to OFF to turn off the slow query log. This will eliminate the overhead causing by slow query logging
- query_store_capture_mode – set both to NONE to turn off the Query Store. This will eliminate the overhead causing by sampling activities by Query Store.
- innodb_buffer_pool_size – Scale up the server to 32 vCore Memory Optimized SKU from the Pricing tier of the portal during migration to increase the innodb_buffer_pool_size. Innodb_buffer_pool_size cannot be increased without scaling up compute for Azure Database for MySQL server.
- innodb_write_io_threads & innodb_write_io_threads - Change to 16 from the Server parameters in Azure portal to improve the speed of migration.
- Scale up Storage tier – The IOPs for Azure Database for MySQL server increases progressively with the increase in storage tier. For faster loads, you may want to provide at the least 4TB to get 12K IOPs.
Virtual Network
You should make the following virtual network (VNet) related settings to secure all the traffic during data migration and preventing the traffic routed through public endpoints:
- In the VNet where the temporary VM has been connected with, add Microsoft.Sql to the service endpoint of the subnet the temporary VM at. This with allow Azure MySQL traffic route through the VNet and subnet
- In the Connection Security setting of the new Azure MySQL instance, add the virtual network and subnet which the temporary VM situated at. This will allow network traffic route through the whitelisted VNet and subnet
Best practice on migration
This section will highlight the best practices on migrating large MySQL databases, particularly the special parameters we recommend to apply during the migration process.
Export the Schema
Since the size of the schema will be limited, we’ll use mysqldump, the original data dumping tool shipped with MySQL, to complete the task of schema export.
When running mysqldump, use the following parameters:
- --no-data– prevent table contents from being dump
- --column-statistics=0 – if you have installed MySQL command line tools version 8.0 instead of 5.7, you must set this parameter to 0 to prevent the following error which is caused by new feature in MySQL 8.0:
Unknown table 'COLUMN_STATISTICS' in information_schema (1109)
- --add-drop-database and --add-drop-table (optional) – the script will drop the existing database and table before creating the new one
Export the Data
Instead of the single-threaded mysqldump, we’ll use MyDumper which support multi-threaded data export, to speed up the export process. Apply the following mydumper parameters to speed up the export process:
- --no-schemas– the schema will be created by the schema script exported in above step
- --compress-protocol – enable client compression protocol to compress the data sent between VM and MySQL server
- --threads – use a value equal to 2x of the vCore of the temporary VM
- --chunk-filesize – split large tables into chunk of file size. This will allow large table to be imported in parallel, recommended in the range of 512 to 1024MB
- --statement-size – set to 104857600 (bytes) to allow generation of extra long rows
Import the schema
Use the mysql command line tool to import the schema. No special parameter need to be applied.
mysql -u <user> -p < db_backup.dump
Drop the indexes temporarily
If possible, drop the indexes before the data import to reduce IO burden on storage. Indexes can be recreated after the data import completed. However it may not be possible if you have a complex schema with a lot of foreign key constraints. Consider the time and effort require to prepare scripts for drop/recreate the index against your migration timeline.
You may use the following script to generate script to drop and recreate all the indexes in your schema:
Drop Indexes
SELECT CONCAT('ALTER TABLE ', SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 2), '/', -1), ' DROP INDEX ', i.name, ';') AS 'drop_idx_sql'
FROM information_schema.innodb_sys_tables t
INNER JOIN information_schema.innodb_sys_indexes i USING (table_id)
INNER JOIN information_schema.innodb_sys_fields f USING (index_id)
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 1), '/', -1) like '$DB'
GROUP BY t.name, i.name;
Re-create Indexes
SELECT CONCAT('CREATE INDEX ', i.name, ' ON ', SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 2), '/', -1), ' (', GROUP_CONCAT(f.name ORDER BY f.pos), ');' ) AS 'create_idx_sql' \
FROM information_schema.innodb_sys_tables t
INNER JOIN information_schema.innodb_sys_indexes i USING (table_id)
INNER JOIN information_schema.innodb_sys_fields f USING (index_id)
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 1), '/', -1) like '$DB'
GROUP BY t.name, i.name;
Import the data
MyLoader, the multi-threaded import tool, shall be used in conjunction with MyDumper using the following recommended parameters:
- --queries-per-transaction – recommend to set to value not more than 500
- --compress-protocol – enable client compression protocol to compress the data sent between VM and MySQL server
- --threads – use a value equal to 2x of the vCore of the temporary VM
Other applicable Scenarios
Besides the mentioned customer scenario, there are other scenarios which may benefit from the best practices above:
- Version upgrades – since there is no direct upgrade from MySQL version 5.7 to 8.0 in Azure yet, data export and import will be required and using the same approach as this document
- Migrate from Azure Database for MySQL legacy (4TB max) storage to new large (16TB max) storage – similar approach. The data and schema export can be done on Azure VMs before copying the exported data and schema to the temporary VM on Azure.
Further Reading
For additional information, please refer to the following:
- Azure Database for MySQL documentation: https://docs.microsoft.com/en-us/azure/mysql/
- mydumper/myloader project in GitHub: https://github.com/maxbube/mydumper
[1] Disk type available for Azure VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/disks-types