Enable batched background migration processing feature flag
Feature
This feature uses the :execute_batched_migrations_on_schedule
feature flag!
The feature flag controls execution of batched background migrations. These migrations are run when a sidekiq-cron entries executes the Database::BatchedBackgroundMigrationWorker
. With the feature flag disabled, the worker execution is a no-op.
Owners
- Team: Database
- Most appropriate slack channel to reach out to:
#g_database
- Best individual to reach out to: @gitlab-org/database
The Rollout Plan
-
Partial Rollout on GitLab.com with beta groups -
Rollout on GitLab.com, until the migration completes or is proven not to have negative performance impact -
Percentage Rollout on GitLab.com - XX% If it is possible to perform an incremental rollout, this should be preferred. Proposed increments are: 10%
,50%
,100%
. Proposed minimum time between increments is 15 minutes. -
Rollout Feature for everyone as soon as it's ready
Expectations
What are we expecting to happen?
We expect the background migrations to begin processing each batch of work sequentially. We chose initial conservative batch sizes that should not have a negative impact on database stability or performance.
What might happen if this goes wrong?
If the batching proves to cause more load than expected, database stability could suffer. If this happens, the feature flag can be turned off to disable the migration processing framework from executing any future jobs.
The job information is tracked in the database, so there should be no negative impact if job processing needs to be temporarily or permanently halted. We can always reschedule the migration to pickup from the previous location.
What can we monitor to detect problems with this?
https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m
High rate of updates for events
or push_event_payload
tables, or increasing percentage of dead tuples for those tables, which could indicate autovacuum cannot keep up.
https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1
High system usage or tps on the primary database server.
Rollout Timeline
Initial Rollout
Preperation Phase
-
Enable on staging ( /chatops run feature set feature_name true --staging
) -
Test on staging -
Ensure that documentation has been updated (More info) -
Coordinate a time to enable the flag with the SRE oncall and release managers - In
#production
by pinging@sre-oncall
- In
#g_delivery
by pinging@release-managers
- In
-
Announce on the issue an estimated time this will be enabled on GitLab.com
Global Availability (More Info) (Please Note that Beta,Alpha and General Availability (GA) are handled on a product level and not the feature-flag)
-
Coordinate a time to enable the flag with #production
and#g_delivery
on slack. -
Announce on the issue an estimated time this will be enabled on GitLab.com -
Make the feature flag enabled by default i.e. Change default_enabled
totrue
-
Enable on GitLab.com by running chatops command in #production
(/chatops run feature set execute_batched_migrations_on_schedule true
) -
Announce on the issue that the flag has been enabled -
Cross post chatops slack command to #support_gitlab-com
(more guidance when this is necessary in the dev docs) and in your team channel
Cleanup
This is an important phase, that should be either done in the next Milestone or as soon as possible. For the cleanup phase, please follow our documentation on how to clean up the feature flag.
-
Announce on the issue that the flag has been enabled -
Remove :feature_name
feature flag-
Remove all references to the feature flag from the codebase -
Remove the YAML definitions for the feature from the repository -
Create a Changelog Entry
-
-
Clean up the feature flag from all environments by running this chatops command in #production
channel/chatops run feature delete some_feature
.
Final Step
-
Close this rollout issue for the feature flag after the feature flag is removed from the codebase.
Rollback Steps
-
This feature can be disabled by running the following Chatops command:
/chatops run feature set execute_batched_migrations_on_schedule false