Skip migrations depending on execution context
Problem
When we have two databases configured we need to selectively run some migrations only on one of them. We have the following migrations:
-
db/migrate/1_ddl.rb
: Migration doing DDL is run onmain
andci
-
db/migrate/2_gitlab_main.rb
: Migration withrestrict gitlab_schema: :gitlab_main
is run only onmain:
and skipped on CI -
db/migrate/3_gitlab_ci.rb
: Migration withrestrict gitlab_schema: :gitlab_ci
is run only onci:
and skipped on Main
Configuration 1: two connections to same database (Phase 4)
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_main
Configuration 2: two connections to different databases (Phase 7)
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_ci
Solution 1: Filter out migrations that should not be run
The problem is that skipped migrations cannot be stored in schema_migrations
. This will fail for phase 4 as we cannot detect if the database is shared between main or ci, as the config/database.yml
will point to the different pgbouncers, but behind them will be the same primary. If we mark migration as run, it will next not be executed when running migration for ci
.
I believe we should detect and remove such migrations from the list of eligible migrations for a given database, so they are never marked as executed. I was thinking being opinionated:
- if migration has
restrict_gitlab_migration gitlab_schema: :gitlab_main
- we require the migration class to be ending with
OnlyGitlabMain
- this implies the migration file to be ending with
_only_gitlab_main.rb
- we could then reject files ending with wrong
_only_*.rb
in here: https://github.com/rails/rails/blob/main/activerecord%2Flib%2Factive_record%2Fmigration.rb#L1218 - such files would not be considered when checking
schema_migrations
The best solution would be separate migration folders, as this is natively supported then. However, long time ago we "decided" to have a single folder for all.
How it would work for Configuration 1 (single database)?
In this model:
- The migration context will filter-out all migrations that are not eligible for a given context based on file nam
- The
db:migrate:main
will see as eligible migrations 1., 2. only - The
db:migrate:main
will run migration 1., 2. only - The
db:migrate:ci
will see as eligible migrations 1., 3. only - The
db:migrate:ci
will skip 1., since it was run bydb:migrate:main
, and run 3. only
How it would work for Configuration 2 (two databases)?
In this model:
- The migration context will filter-out all migrations that are not eligible for a given context based on file nam
- The
db:migrate:main
will see as eligible migrations 1., 2. only - The
db:migrate:main
will run migration 1., 2. only - The
db:migrate:ci
will see as eligible migrations 1., 3. only - The
db:migrate:ci
will run migration 1., 3. only
database_tasks:
Solution 2: Use As proposed in !82902 (merged) we will use database_tasks:
to indicate which of the configurations should not have db:migrate
run since they are covered by other databases.
How it would work for Configuration 1 (single database)?
In this example, the Configuration 1
will look as following:
# main/ci points to the same database
production:
main:
host: postgres
database: gitlabhq_main
ci:
host: postgres
database: gitlabhq_main
database_tasks: false
Due to https://github.com/rails/rails/pull/43285#issuecomment-927952059 there will not be db:migrate:main/ci
, only db:migrate:main
.
When running db:migrate
it will see and run all migrations 1., 2., 3.
How it would work for Configuration 2 (two databases)?
The application will have access to db:migrate:main
, and db:migrate:ci
.
- The
db:migrate:main
will see as eligible migrations 1., 2., 3. - The
db:migrate:main
will run migration 1., 2.. The 3. will be "skipped softly by printing message in console", but still marking inschema_migrations
as run. - The
db:migrate:ci
will see as eligible migrations 1., 2., 3. - The
db:migrate:ci
will run migration 1., 3.. The 2. will be "skipped softly by printing message in console", but still marking inschema_migrations
as run.
Summary
It appears that the most native and supported way is to follow database_tasks: true/false
approach as this is natively supported solution. The potential problem is whether we want to support Phase 4 natively, if yes, in such case this will be an additional configuration to maintain. Likely in this model there should be a way to "validate" if config/database.yml
makes sense, aka. if you are missing database_tasks:
where-as you should have it.
The downside is that migration skipped will be marked as "run" in schema_migrations
, so in a case of misconfigured database_tasks: false/true
it would result in some migrations not run.
Likely configuration validation can be done as part of dedicated rake tasks that might be using ar_internal_metadata
to store a configuration name of database, that can be later read and compared. This could be actually be done before db:migrate
to ensure that configuration is "correct", and fail.
Recommendation
- Don't use dedicated
db:migrate:#{name}
tasks - Use
database_tasks: false/true
to properly supportdb:migrate
behavior without any hacks and natively support Phase 4 (two connections, single database) or Phase 7 configuration (two connections, two databases) - Skipped migrations will be skipped softly: 1. a message will be printed, 2. migration will be marked as run in
schema_migrations
- A
db:migrate
will be dependent ongitlab:db:validate_config
that will ensure thatdatabase_tasks:
are properly configured to ensure that we avoid configuration mistakes - The
gitlab:db:validate_config
will iterate each database, remove, set inar_internal_metadata
an expected config name, ensure that this config name is properly visible on all other databases depending ondatabase_tasks:
configuration