Skip migrations depending on execution context
Problem
When we have two databases configured we need to selectively run some migrations only on one of them. We have the following migrations:
-
db/migrate/1_ddl.rb: Migration doing DDL is run onmainandci -
db/migrate/2_gitlab_main.rb: Migration withrestrict gitlab_schema: :gitlab_mainis run only onmain:and skipped on CI -
db/migrate/3_gitlab_ci.rb: Migration withrestrict gitlab_schema: :gitlab_ciis run only onci:and skipped on Main
Configuration 1: two connections to same database (Phase 4)
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_main
Configuration 2: two connections to different databases (Phase 7)
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_ci
Solution 1: Filter out migrations that should not be run
The problem is that skipped migrations cannot be stored in schema_migrations. This will fail for phase 4 as we cannot detect if the database is shared between main or ci, as the config/database.yml will point to the different pgbouncers, but behind them will be the same primary. If we mark migration as run, it will next not be executed when running migration for ci.
I believe we should detect and remove such migrations from the list of eligible migrations for a given database, so they are never marked as executed. I was thinking being opinionated:
- if migration has
restrict_gitlab_migration gitlab_schema: :gitlab_main - we require the migration class to be ending with
OnlyGitlabMain - this implies the migration file to be ending with
_only_gitlab_main.rb - we could then reject files ending with wrong
_only_*.rbin here: https://github.com/rails/rails/blob/main/activerecord%2Flib%2Factive_record%2Fmigration.rb#L1218 - such files would not be considered when checking
schema_migrations
The best solution would be separate migration folders, as this is natively supported then. However, long time ago we "decided" to have a single folder for all.
How it would work for Configuration 1 (single database)?
In this model:
- The migration context will filter-out all migrations that are not eligible for a given context based on file nam
- The
db:migrate:mainwill see as eligible migrations 1., 2. only - The
db:migrate:mainwill run migration 1., 2. only - The
db:migrate:ciwill see as eligible migrations 1., 3. only - The
db:migrate:ciwill skip 1., since it was run bydb:migrate:main, and run 3. only
How it would work for Configuration 2 (two databases)?
In this model:
- The migration context will filter-out all migrations that are not eligible for a given context based on file nam
- The
db:migrate:mainwill see as eligible migrations 1., 2. only - The
db:migrate:mainwill run migration 1., 2. only - The
db:migrate:ciwill see as eligible migrations 1., 3. only - The
db:migrate:ciwill run migration 1., 3. only
Solution 2: Use database_tasks:
As proposed in !82902 (merged) we will use database_tasks: to indicate which of the configurations should not have db:migrate run since they are covered by other databases.
How it would work for Configuration 1 (single database)?
In this example, the Configuration 1 will look as following:
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_main
database_tasks: false
Due to https://github.com/rails/rails/pull/43285#issuecomment-927952059 there will not be db:migrate:main/ci, only db:migrate:main.
When running db:migrate it will see and run all migrations 1., 2., 3.
How it would work for Configuration 2 (two databases)?
The application will have access to db:migrate:main, and db:migrate:ci.
- The
db:migrate:mainwill see as eligible migrations 1., 2., 3. - The
db:migrate:mainwill run migration 1., 2.. The 3. will be "skipped softly by printing message in console", but still marking inschema_migrationsas run. - The
db:migrate:ciwill see as eligible migrations 1., 2., 3. - The
db:migrate:ciwill run migration 1., 3.. The 2. will be "skipped softly by printing message in console", but still marking inschema_migrationsas run.
Summary
It appears that the most native and supported way is to follow database_tasks: true/false approach as this is natively supported solution. The potential problem is whether we want to support Phase 4 natively, if yes, in such case this will be an additional configuration to maintain. Likely in this model there should be a way to "validate" if config/database.yml makes sense, aka. if you are missing database_tasks: where-as you should have it.
The downside is that migration skipped will be marked as "run" in schema_migrations, so in a case of misconfigured database_tasks: false/true it would result in some migrations not run.
Likely configuration validation can be done as part of dedicated rake tasks that might be using ar_internal_metadata to store a configuration name of database, that can be later read and compared. This could be actually be done before db:migrate to ensure that configuration is "correct", and fail.
Recommendation
- Don't use dedicated
db:migrate:#{name}tasks - Use
database_tasks: false/trueto properly supportdb:migratebehavior without any hacks and natively support Phase 4 (two connections, single database) or Phase 7 configuration (two connections, two databases) - Skipped migrations will be skipped softly: 1. a message will be printed, 2. migration will be marked as run in
schema_migrations - A
db:migratewill be dependent ongitlab:db:validate_configthat will ensure thatdatabase_tasks:are properly configured to ensure that we avoid configuration mistakes - The
gitlab:db:validate_configwill iterate each database, remove, set inar_internal_metadataan expected config name, ensure that this config name is properly visible on all other databases depending ondatabase_tasks:configuration