Add migration to create packages_tags.project_id and backfill
What does this MR do and why?
As part of the Cells architecture we will need to add a "sharding key" to all tables that are sharded across cells. Cells are separate GitLab instances and will eventually be how we run GitLab.com. We've documented in https://docs.gitlab.com/ee/architecture/blueprints/organization/isolation.html how this sharding key will be necessary to prevent any cross organization data in the database and allow us to confidently move data between cells.
The plan is to add either project_id
or namespace_id
to every table that ultimately belongs to an "Organization" and can therefore be moved between Cells as a complete unit. A lot of tables already have one of these columns but we need to add and backfill this column for a few hundred tables. A lot of the migrations will be similar to this one in that we can just look up another foreign key and find the project_id
from that related parent table.
This MR is the first in adding and backfilling this column to a specific table.
The packages_tags
table is not very large but we still expect a batched background migration will be the safest way to backfill this table.
With a batch size of 1000
and a 2.minutes
wait between updates and this table containing ~800k rows this should take around 800000/1000 * 2 minutes = 27 hours
.
Database queries
Update a batch
explain WITH joined_cte(packages_tag_id, project_id) AS MATERIALIZED (
SELECT packages_tags.id, packages_packages.project_id FROM "packages_tags" INNER JOIN packages_packages ON packages_tags.package_id = packages_packages.id WHERE "packages_tags"."id" BETWEEN 1 AND 802962 AND "packages_tags"."project_id" IS NULL AND "packages_tags"."id" >= 100270 AND "packages_tags"."id" < 100585
)
UPDATE packages_tags
SET project_id = joined_cte.project_id
FROM joined_cte
WHERE id = joined_cte.packages_tag_id
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23586/commands/75828
Screenshots or screen recordings
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #429173 (closed)