In a recent blog post, Sai Kondapalli discussed how to use gh-ost to perform online schema changes in Azure Database for MySQL. In this post, I’ll go a step further and implement this into a fully automated, continuous deployment process.
Continuous deployment is an end-to-end process related to DevOps. With continuous deployment, the entire process from code commit to production is automated. The trigger between the Develop and Deliver phases is automatic, so code changes are pushed live after they receive validation and pass all tests. This means that customers receive improvements as soon as they’re available.
Because a data model is tightly coupled with an application, the database schema is usually part of the code base as well, so developers are maintaining it. In this case, it makes sense to use continuous deployment not only on stateless application components, but also on the data. Doing so helps to unlock the full potential of continuous deployment.
Implementing continuous deployment for your data model relies primarily on the key considerations, including how to:
There are various solutions for versioning your data model in a source code control system (e.g., git), ranging from pure SQL or specific schema definitions in code to complete, third-party product suites.
For this approach, we’ll define the data model in SQL using one SQL file, which only contains the CREATE statement, for every table. To avoid writing complex migration scripts (including the rollback) between different database schema versions, we’ll use the open-source solution "skeema", a CLI tool that supports various schema related use cases. Skeema generates an actionable diff from two data models, so the developer only need focus on how the data model should look. Using the automated diff over Skeema reduces the risk of errors that can occur in self-written migration scripts and frees the developer to focus on other topics. In addition, there’s full transparency about changes in SQL files.
There are several challenges associated with migrating from one data model to another, for example the database will be in use and often will contain hundreds of tables with millions of rows. To address these points, we’re using gh-ost, as explained in the blog post Performing online schema changes in Azure Database for MySQL by using gh-ost. Gh-ost, GitHub’s online schema migration solution, is an open-source utility that we’ll use to execute the migration without any triggers and most importantly, while online. Gh-ost also supports more complex scenarios, some of which may be covered in a future blog post.
Of course, gh-ost doesn’t need to be used for all migrations, e.g., for simple SQL statements such as creating a new table or renaming a column, which can be run on the database directly.
We’re integrating skeema and gh-ost with GitHub Actions to automate our continuous deployment workflow. These tools are being run via a GitHub Actions pipeline.
In our example, we’ll use two pipelines. The first is part of build validation, and it’s triggered by creating a new pull request in GitHub. A reviewer, for example the database operator, can look through the changes and approve or reject them. If approved, the SQL schema changes are merged into the production branch, in our case the master branch. This in turn triggers the second pipeline, the deployment, or more precisely, the continuous migration of the data model.
Note: Because automated release and deployment processes (continuous deployment) combine your technology stack and requirements with the processual circumstances, they can differ greatly from company to company and from application to application. The process explained in this blog post is only an example, and you should carefully think through this guidance when implementing it in your environment.
To begin, you need an existing server running Azure Database for MySQL. In our scenario, we’re not using any testing or replica solutions for the migration, so having one server to represent the production database is sufficient.
On this server, ensure that the binlog_row_image parameter is set to FULL. In addition, verify that the server is network-accessible by GitHub Actions, which will run skeema and gh-ost to work their magic on the database. If you’re using the GitHub hosted runners (default), you can provide network access by enabling the Allow public access from Azure services and resources within Azure setting on your server. For more information, in the article Manage firewall rules, see the section Connect from Azure.
Finally, regarding MySQL privileges, create a database user that has following permissions:
To create a GitHub repository that contains the example database schema and the two pipelines, navigate to the repository at github.com/tejado/cicd-sql-schema-migrations, select Use this template, specify a repository name, and then select Create repository from template.
The repository already contains two pipelines, but it isn't yet aware of the MySQL server and the database credentials. To accomplish this, the repository requires the following secrets:
Secret name |
Description |
MYSQL_PRODUCTION_HOST |
The MySQL server that represents the “production" database. Schema migration will be accomplished automatically over gh-ost on this server. |
MYSQL_PRODUCTION_PORT |
|
MYSQL_PRODUCTION_USERNAME |
The user requires certain privileges for the migration. For more information, see gh-ost - Requirements and limitations. |
MYSQL_PRODUCTION_PASSWORD |
|
MYSQL_SCHEMA |
The database name of the application inside the MySQL server; this is also used in the temporary database in the pull request pipeline. |
To add these to the repository, for each secret, perform the following steps:
When complete, the list of secrets should look similar to the following:
After completing the setup, be sure to test the functionality by making a schema change.
Note: The following steps represent the continuous deployment workflow; follow them each time that a database schema change is required.
Creating a pull request “suggests” that the changes be merged into the master branch, and the pull request pipeline starts automatically. After a few seconds, the GitHub Action pipeline labels your pull requests and comments with the output from skeema:
Approve the pull request to merge the changes into the master branch, and the deployment pipeline starts and runs gh-ost for ALTER TABLE statements on the specified database.
Introducing continuous deployment for a data model defines a clear, end-to-end process across developers and database operators. It makes database schema changes more transparent and migrates data in a fully automated process. This approach has a lot of benefits and delivers value directly to your application users.
If you have any questions about the detail I’ve provided, please leave a comment below or email our team at MySQLMariaDBNinjas@service.microsoft.com. Thank you!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.