Support database migrations via `db/migrate` to be properly executed in a relevant context
The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.
This is follow-up after: #339902 (closed).
Related to: #341783 (closed).
Problem
- Today all
db/migrate
changes (DDL, selects, and DML) are executed on top of all databases (main and CI) - After decomposed the data will be split into two data stores (main and CI) limiting what migrations can do in a given context
- Since the migrations code to execute (
db/migrate
) is being shared, but data is split (decomposed database) we need to be clever what migrations are executed in what context
Proposal
1. Design proposal
- We made a design decision some time ago to share
db/structure.sql
across all databases, so all DDL changes needs to be replicated - We made a decision to not share data (so tables of CI will be truncated in main context and vice-versa) so we should not run any select/DML changes on top of other database
- The
db/migrate
can only talk to the current database. It is hard/not-desired to allowdb/migrate
to cross-talk between database so it only can mutate data of an existing context (main or CI) - Cross-talk between databases can only be implemented via background migrations which can be scheduled from
db/migrate
- Migration needs to be of a single purpose. It needs to modify DDL or select/DML never both
2. Implementation proposal
- Following the #339902 (comment 690167297). The change is to move away from
gitlab_migration
into a simpler concept ofrestrict_gitlab_migration
. We need to only limit migration to a given context, since all DDL changes are replicated everywhere. - Describe
db/migrate
anddb/post_migrate
with the purpose of migration in a form ofrestrict_gitlab_migration schema: :gitlab_main, environment: ...
- Implement an application-level validation using
pg_query
similar to how we do it withcross-joins
to validate the purpose of the query made by the migration. Check if it adheres togitlab_schemas
ofconfig/database.yml
- Properly configure
config/database.yml
for a decomposed DB with a `database_tasks: - The behavior of
restrict_gitlab_migration
is to skip migrations unless matching conditions - All select/DML changes require the presence of
restrict_gitlab_migration
as part of theGitlab::Database::Migration[2.0]
3. Validation of migrations
- The
restrict_gitlab_migration
serves a purpose to declare what a given migration does and is meant to implement a way to skip migrations unless matching conditions - The
restrict_gitlab_migration
on its own does not validate the migration - The actual validation of migration is done as part of running decomposed CI testing and replaying latest migrations
- The decomposed CI testing defines on each connection what tables can be read or written
- The migration reading/modifying data on a given connection outside of defined
gitlab_schemas
will be disallowed, which will raise an exception and will fail CI - The CI job validates that
db/structure.sql
produced by main and CI is exactly the same
4. Used decomposed database config in CI for testing migrations
Example migrations
Simple DDL migration being executed in all cases since we reply all structure.sql (allowed)
For majority of cases migrations does not need to be changed since the majority of our migrations perform only DDL changes (add/remove columns, add/remove foreign keys, create new indexes).
class AddReportTypeIndexIntoApprovalProjectRules < Gitlab::Database::Migration[2.0]
disable_ddl_transaction!
INDEX_NAME = 'index_approval_project_rules_report_type'
def up
add_concurrent_index :approval_project_rules, :report_type, name: INDEX_NAME
end
def down
remove_concurrent_index_by_name :approval_project_rules, name: INDEX_NAME
end
end
Simple DML migration (allowed)
Since this migration mutates data of file_registry
it can only be executed in gitlab_main
.
class FixStateColumnInFileRegistry < Gitlab::Database::Migration[2.0]
restrict_gitlab_migration gitlab_schema: :gitlab_main
disable_ddl_transaction!
# The following cop is disabled because of https://gitlab.com/gitlab-org/gitlab/issues/33470
# rubocop:disable Migration/UpdateColumnInBatches
def up
update_column_in_batches(:file_registry, :state, 2) do |table, query|
query.where(table[:success].eq(true)) # rubocop:disable CodeReuse/ActiveRecord
end
end
# rubocop:enable Migration/UpdateColumnInBatches
def down
# no-op
end
end
Complex DDL and DML change (forbidden)
Such migrations are forbidden as they modify DDL and DML in a single execution. This means that we are unable to apply DDL changes on all databases, and DML changes only on main database.
class DeleteLfsObjectsFromFileRegistry < ActiveRecord::Migration[6.0]
DOWNTIME = false
def up
execute("DELETE FROM file_registry WHERE file_type = 'lfs'")
execute('DROP TRIGGER IF EXISTS replicate_lfs_object_registry ON file_registry')
execute('DROP FUNCTION IF EXISTS replicate_lfs_object_registry()')
end
def down
# no-op
end
end
Schedule of background migrations (allowed)
This type of migration is a select-type it means that it requires source data. This requires that such migration needs to be only executed in a single context (main or ci).
class PopulateCanonicalEmails < Gitlab::Database::Migration[2.0]
restrict_gitlab_migration gitlab_schema: :gitlab_main
disable_ddl_transaction!
class User < ActiveRecord::Base
include EachBatch
self.table_name = 'users'
scope :with_gmail, -> { select(:id, :email).where("email ILIKE '%gmail.com'") }
end
# Limited to *@gmail.com addresses only as a first iteration, because we know
# Gmail ignores `.` appearing in the Agent name, as well as anything after `+`
def up
# batch size is the default, 1000
migration = Gitlab::BackgroundMigration::PopulateCanonicalEmails
migration_name = migration.to_s.demodulize
queue_background_migration_jobs_by_range_at_intervals(
User.with_gmail,
migration_name,
1.minute)
end
def down
# no-op
end
end
ActiveRecord
models for accessing data (allowed)
Usage of All models nested into migration needs to be based as ActiveRecord::Base
regardless on what database the data is located (main or CI). This works out of box
since Rails when running migrations in context of main
or ci
does switch
a connection on ActiveRecord::Base
. Additionally depending on where the given
migration is executed the ActiveRecord::Base.connection_db_config.name == 'main' or 'ci'
.
This already is coherent with our todays guidelines.
class ConfirmSecurityBot < Gitlab::Database::Migration[2.0]
restrict_gitlab_migration gitlab_schema: :gitlab_main
class User < ActiveRecord::Base
self.table_name = 'users'
SECURITY_BOT_TYPE = 8
end
def up
User.where(user_type: User::SECURITY_BOT_TYPE, confirmed_at: nil)
.update_all(confirmed_at: Time.current)
end
# no-op
# Security Bot should be always confirmed
def down
end
end
Iterations
- Implement a way to validate if connection is being used in a correct way: a right tables are being accessed only: !71540 (closed)
- Implement
restrict_*
helpers - Document
restrict_*
helpers and a way to use them - Implement and run decomposed migrations that use schema usage validation and restrict helpers to ensure that migrations are properly written
- (Optional) Make the
gitlab_schema
validation as part of production code to see if connections are used in a right way
This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.