Continuous schema migrations for MySQL databases
Published May 18 2022 01:45 PM 4,702 Views
Microsoft

In a recent blog post, Sai Kondapalli discussed how to use gh-ost to perform online schema changes in Azure Database for MySQL. In this post, I’ll go a step further and implement this into a fully automated, continuous deployment process.

 

What is continuous deployment?

Continuous deployment is an end-to-end process related to DevOps. With continuous deployment, the entire process from code commit to production is automated. The trigger between the Develop and Deliver phases is automatic, so code changes are pushed live after they receive validation and pass all tests. This means that customers receive improvements as soon as they’re available.

 

Because a data model is tightly coupled with an application, the database schema is usually part of the code base as well, so developers are maintaining it. In this case, it makes sense to use continuous deployment not only on stateless application components, but also on the data. Doing so helps to unlock the full potential of continuous deployment.

Considerations for implementing continuous deployment

Implementing continuous deployment for your data model relies primarily on the key considerations, including how to:

  • Define the MySQL schema and changes between versions in code
  • Migrate continuously between two data model versions

Defining the MySQL schema and changes between versions in code

There are various solutions for versioning your data model in a source code control system (e.g., git), ranging from pure SQL or specific schema definitions in code to complete, third-party product suites.

 

For this approach, we’ll define the data model in SQL using one SQL file, which only contains the CREATE statement, for every table. To avoid writing complex migration scripts (including the rollback) between different database schema versions, we’ll use the open-source solution "skeema", a CLI tool that supports various schema related use cases. Skeema generates an actionable diff from two data models, so the developer only need focus on how the data model should look. Using the automated diff over Skeema reduces the risk of errors that can occur in self-written migration scripts and frees the developer to focus on other topics. In addition, there’s full transparency about changes in SQL files.

Migrating continuously between two data model versions

There are several challenges associated with migrating from one data model to another, for example the database will be in use and often will contain hundreds of tables with millions of rows. To address these points, we’re using gh-ost, as explained in the blog post Performing online schema changes in Azure Database for MySQL by using gh-ost. Gh-ost, GitHub’s online schema migration solution, is an open-source utility that we’ll use to execute the migration without any triggers and most importantly, while online. Gh-ost also supports more complex scenarios, some of which may be covered in a future blog post.

 

Of course, gh-ost doesn’t need to be used for all migrations, e.g., for simple SQL statements such as creating a new table or renaming a column, which can be run on the database directly.

Overview of implementing continuous deployment

We’re integrating skeema and gh-ost with GitHub Actions to automate our continuous deployment workflow. These tools are being run via a GitHub Actions pipeline.

 

In our example, we’ll use two pipelines. The first is part of build validation, and it’s triggered by creating a new pull request in GitHub. A reviewer, for example the database operator, can look through the changes and approve or reject them. If approved, the SQL schema changes are merged into the production branch, in our case the master branch. This in turn triggers the second pipeline, the deployment, or more precisely, the continuous migration of the data model.

 

01-dev-process.png

 

Note: Because automated release and deployment processes (continuous deployment) combine your technology stack and requirements with the processual circumstances, they can differ greatly from company to company and from application to application. The process explained in this blog post is only an example, and you should carefully think through this guidance when implementing it in your environment.

Implementing continuous deployment

 

Setting up an Azure Database for MySQL server

To begin, you need an existing server running Azure Database for MySQL. In our scenario, we’re not using any testing or replica solutions for the migration, so having one server to represent the production database is sufficient.

 

On this server, ensure that the binlog_row_image parameter is set to FULL. In addition, verify that the server is network-accessible by GitHub Actions, which will run skeema and gh-ost to work their magic on the database. If you’re using the GitHub hosted runners (default), you can provide network access by enabling the Allow public access from Azure services and resources within Azure setting on your server. For more information, in the article Manage firewall rules, see the section Connect from Azure.

 

Finally, regarding MySQL privileges, create a database user that has following permissions:

  • ALTER, CREATE, DELETE, DROP, INDEX, INSERT, LOCK TABLES, SELECT, TRIGGER, UPDATE on the database that should contain our application schema.
  • REPLICATION CLIENT, REPLICATION SLAVE ON.

Creating a GitHub repository

To create a GitHub repository that contains the example database schema and the two pipelines, navigate to the repository at github.com/tejado/cicd-sql-schema-migrations, select Use this template, specify a repository name, and then select Create repository from template.

 

02-github-repo-new.png

 

The repository already contains two pipelines, but it isn't yet aware of the MySQL server and the database credentials. To accomplish this, the repository requires the following secrets:

 

Secret name

Description

MYSQL_PRODUCTION_HOST

The MySQL server that represents the “production" database. Schema migration will be accomplished automatically over gh-ost on this server.

MYSQL_PRODUCTION_PORT

MYSQL_PRODUCTION_USERNAME

The user requires certain privileges for the migration. For more information, see gh-ost - Requirements and limitations.

MYSQL_PRODUCTION_PASSWORD

MYSQL_SCHEMA

The database name of the application inside the MySQL server; this is also used in the temporary database in the pull request pipeline.

 

To add these to the repository, for each secret, perform the following steps:

  1. In GitHub, navigate to the main page of the newly created repository, and then, under the repository name, select Settings.
  2. In the left sidebar, select Secrets -> Actions, and then, on the Actions secrets page, select New repository secret.

    03-github-secrets-navigation.png

     

  3. In the Name field, specify the name of the secret, in the Value field, specify the value of the secret, and then select Add secret.

    04-github-secrets-add.png

     

    When complete, the list of secrets should look similar to the following:

    05-github-secrets.png

     

Testing the functionality

After completing the setup, be sure to test the functionality by making a schema change.

 

Note: The following steps represent the continuous deployment workflow; follow them each time that a database schema change is required.

 

  1. In the schema folder, modify one of the SQL files, e.g., add a column to customers.sql over the GitHub website. There's no need to clone the repository for a brief test.
  2. Select the Create a new branch… radio button, and then select Propose changes to commit the change to a feature branch.

    Important: Do not commit a change to the master branch.

    06-github-pr-new.png

     

  3. On the next page, select Create pull request.
     

    Creating a pull request “suggests” that the changes be merged into the master branch, and the pull request pipeline starts automatically. After a few seconds, the GitHub Action pipeline labels your pull requests and comments with the output from skeema:

    07-github-pr-skeema.png

     

    Approve the pull request to merge the changes into the master branch, and the deployment pipeline starts and runs gh-ost for ALTER TABLE statements on the specified database.
  4. In the repository, select Actions to view deployment of the changes (the CREATE TABLE and DROP TABLE statements are "simple" SQL changes that will be run on the database directly and not over gh-ost):

    08-github-actions-overview.png

     

  5. Use any common database tool, e.g., MySQL Workbench or MySQL CLI, to verify that the changes are implemented on the "production" database.
  6. Experiment with other changes, and gh-ost will automatically migrate the data model.

Conclusion

Introducing continuous deployment for a data model defines a clear, end-to-end process across developers and database operators. It makes database schema changes more transparent and migrates data in a fully automated process. This approach has a lot of benefits and delivers value directly to your application users.

 

If you have any questions about the detail I’ve provided, please leave a comment below or email our team at MySQLMariaDBNinjas@service.microsoft.com. Thank you!

References

Co-Authors
Version history
Last update:
‎May 19 2022 10:13 AM
Updated by: