Backfill project namespaces rollout and rollback plan
This issue's purpose is to solely describe and collaborate on the project namespaces backfill migration rollout and in any eventuality pause and rollback.
Summary
As part of the groups and projects consolidation we will be backfilling a project namespace(a record in namespaces table) for every project. That has the potential of increasing the row count on namespaces table by multiple fold. For instance on .com it will be ~3x increase in rows counts. The details on reasoning around this can be found in following resources:
- Architecture Blueprint
- Architecture Blueprint MR
- Workspace project: (Consolidate Groups and Projects)
Rollout Plan
Backfilling all project namespaces for the corresponding projects can potentially result in unforeseen DB pressure. The backfilling of the namespaces table will be done incrementally, in multiple small batches.
We need to determine a small enough percentage of projects to be backfilled using batched migrations. Batched migrations allow us to control the size of a batch to be migrating by selecting a range of IDs that we want to migrate.
Migration rollback
Another benefit of using batched migrations is that we can pause the backfilling of project namespaces at any time, either by changing the status of the batched migration from active to paused or removing batched migration records altogether.
A complete rollback involves removing project namespace records from namespaces table. This is to be done as part of a separate migration and it a last resort decision.
Open Questions
On .com we can monitor the migration so we can run batches incrementally when one batch finished start another one, etc. Also we would know when the migration finished, so we can start adding Phase 2 code.
~~We do not have the same control over self-managed instances. ~~
Self managed instances have control over batched migrations, here is documentation on monitoring and troubleshooting batched migrations:
- https://docs.gitlab.com/ee/update/index.html#batched-background-migrations
- https://docs.gitlab.com/ee/user/admin_area/monitoring/background_migrations.html
Self-managed instances have much smaller data sizes and loads compared to GitLab.com and we will be using .com as the battle field testing. We still need to make sure to take all caution measures for self-managed customers. Here are some open questions for running the migration or self-managed customers
On the migration side batched migration splits the backfilling migration into multiple batched jobs with small enough batches. These batches are run sequentially and can be paused/resumed on demand.
- Q: Do we need special to give heads-up for this migration to self-managed customers?
- A: ?
- Q: Do we need some kind of delays between batched migrations?
- A: there are build in delays between batches.
- Q: How do we know if backfill migration has completed successfully on self-managed and we are ok releasing phase 2 code?
- A: We currently require all background migrations (both regular ones and batched background migrations) to have been completed before an instance is upgraded (link to documentation).