Not long ago, I started to look at database related issues at work. One of them being data migrations. So in this blog I’m going to explore the key concept of data migration and a small example of using Flywaydb.
Why do we need database migration?
Think how we use our database in the project. There are scenarios when we:
- set up different instances of databases for production, staging and development environment. But from time to time, these instances may go out of sync;
- release a new feature when we also need to make correspondent change to our database schema, adding new indexes or altering tables;
- track changes we made to a database so we can rollback back to a previous version of the database when something seriously go wrong and we want to have a clean state;
- provide consistent and structured data for our CI/CD pipeline to test on;
- have new team member joined who need to have database set-up from scratch;
The list goes on. But you may already sense that we need something similar to version control to manage our database — database migration.
So what is database migration?
Database migration, sometimes called schema migration, is about managing incremental and versioned changes to our database so that it goes along with the matching state of software release.
How do we migrate database?
Of course we can just use a combined of shell and sql scripts to migrate our database. But this is a bit “raw”, and not persistent. For any established project, it’s easier to use a tool to assist your task.
There are quite a few popular tools on the market. Like Liquibase or db-migrate. But here we gonna take a quick look at Flywaydb.
In Flywaydb, Migrations can be either versioned or repeatable. Versioned migrations come in 2 forms: regular and undo.
Versioned migrations have a version, a description and a checksum. The version must be unique. The description is purely informative for you. The checksum is there to detect accidental changes. Versioned migrations are the most common type of migration. They are applied in order exactly once.
Optionally their effect can be undone by supplying an undo migration with the same version.
Repeatable migrations have a description and a checksum, but no version. Instead of being run just once, they are (re-)applied every time their checksum changes. Within a single migration run, repeatable migrations are always applied last, after all pending versioned migrations have been executed. Repeatable migrations are applied in the order of their description.
Note that a Migration must no change once it has been applied to the database. (Flyway is enforcing this by keeping track of a checksum for each Migration.)
For example, below is a simple example of a database migration file that is named
data = data - 'closeReason'
data->>'closeReason' is not null
data->>'status' != 'CLOSED';
Note that this is nothing complex other than a plain sql file. How does Flywaydb knows how to execute this file? Well this can be done in bash command or
flywa.conf the config file:
flyway.driver=org.postgresql.Driver // there are other JDBC drivers
//supported by flyway.
This configuration file should be stored to the root directory of the project.
Then we can just execute the file by executing as follows.
Flyway 4.0.3 by Boxfuse
Successfully validated 1migrations (execution time 00:00.017s)
Creating Metadata table: "application"."schema_version"
Current version of schema "application": << Empty Schema >>
Migrating schema "application" to version 5.1 - updateApplicationStatus
Successfully applied 1 migrations to schema "application" (execution time 00:00.066s).
Note that in above example, we can see the migration script is new. Re-execute the command again and you will see that the migration script is not been re-executed.
Flyway 4.0.3 by Boxfuse
Successfully validated 1migrations (execution time 00:00.018s)
Current version of schema "application": 5.1
Schema "application" is up to date. No migration necessary.
To learn more about Flywaydb, check out here.