Since the general availability of Large Storage (16TB) option in Azure Database for MySQL, large databases in dimension of hundreds of gigabytes to multiple terabytes (up to 16TB), and workload requires high concurrency IO such as OLTP workloads (up to 20K IOPs at the time of writing this) is now supported. This raises an important question:
"What is the fastest way of migrating your large databases, either from on-premises or other cloud environments to Azure?"
Based on our experience working with a customer to migrate their large databases to Azure, we evaluated all the available options and came up with the following best practices. This blog post is intended to share some of these best practices on migrating large databases to Azure Database for MySQL to help other customers.
We have a customer has already been running their mission critical OLTP workload on MySQL with database sized at 1.7TB (~600GB of data plus indexes). The database is currently running on legacy storage and customer is seeking to migrate to Azure Database for MySQL service with large storage to take the advantage of support of 16TB storage and 20,000 IOPS.
The following factors need to be considered on course of planning the large MySQL database migration:
Given the considerations above, we have devised the following migration strategy:
sudo apt-get install mysql-client
wget https://github.com/maxbube/mydumper/releases/download/v0.9.5/mydumper_0.9.5-2.xenial_amd64.deb -i mydumper_0.9.5-2.xenial_amd64.deb
Any General-Purpose family Linux VM would serve the purpose. Ubuntu should benefit customer since it is the VM with lowest cost. The VM should have at least 4 vCore and 16GB of memory, with support and enablement of Accelerated Networking to provide a stable network.
A Premium Disk should be added to the VM to store the exported data. Prefer to be sized at least 1TB or above (P30 tier or higher) to leverage the 5000+ IOPS and 200+MB/second throughput per disk[1].
The temporary VM should also be in the same VNet as the source and target MySQL server to ensure connectivity.
Create a new instance of Azure MySQL. It is recommended to apply the following settings before the migration:
You should make the following virtual network (VNet) related settings to secure all the traffic during data migration and preventing the traffic routed through public endpoints:
This section will highlight the best practices on migrating large MySQL databases, particularly the special parameters we recommend to apply during the migration process.
Since the size of the schema will be limited, we’ll use mysqldump, the original data dumping tool shipped with MySQL, to complete the task of schema export.
When running mysqldump, use the following parameters:
Unknown table 'COLUMN_STATISTICS' in information_schema (1109)
Instead of the single-threaded mysqldump, we’ll use MyDumper which support multi-threaded data export, to speed up the export process. Apply the following mydumper parameters to speed up the export process:
Use the mysql command line tool to import the schema. No special parameter need to be applied.
mysql -u <user> -p < db_backup.dump
If possible, drop the indexes before the data import to reduce IO burden on storage. Indexes can be recreated after the data import completed. However it may not be possible if you have a complex schema with a lot of foreign key constraints. Consider the time and effort require to prepare scripts for drop/recreate the index against your migration timeline.
You may use the following script to generate script to drop and recreate all the indexes in your schema:
Drop Indexes
SELECT CONCAT('ALTER TABLE ', SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 2), '/', -1), ' DROP INDEX ', i.name, ';') AS 'drop_idx_sql'
FROM information_schema.innodb_sys_tables t
INNER JOIN information_schema.innodb_sys_indexes i USING (table_id)
INNER JOIN information_schema.innodb_sys_fields f USING (index_id)
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 1), '/', -1) like '$DB'
GROUP BY t.name, i.name;
Re-create Indexes
SELECT CONCAT('CREATE INDEX ', i.name, ' ON ', SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 2), '/', -1), ' (', GROUP_CONCAT(f.name ORDER BY f.pos), ');' ) AS 'create_idx_sql' \
FROM information_schema.innodb_sys_tables t
INNER JOIN information_schema.innodb_sys_indexes i USING (table_id)
INNER JOIN information_schema.innodb_sys_fields f USING (index_id)
WHERE SUBSTRING_INDEX(SUBSTRING_INDEX(t.name, '/', 1), '/', -1) like '$DB'
GROUP BY t.name, i.name;
MyLoader, the multi-threaded import tool, shall be used in conjunction with MyDumper using the following recommended parameters:
Besides the mentioned customer scenario, there are other scenarios which may benefit from the best practices above:
For additional information, please refer to the following:
[1] Disk type available for Azure VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/disks-types
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.