Skip to content

Add migration to create packages_tags.project_id and backfill

Dylan Griffith requested to merge 429173-add-project-id-to-packages-tags into master

What does this MR do and why?

As part of the Cells architecture we will need to add a "sharding key" to all tables that are sharded across cells. Cells are separate GitLab instances and will eventually be how we run GitLab.com. We've documented in https://docs.gitlab.com/ee/architecture/blueprints/organization/isolation.html how this sharding key will be necessary to prevent any cross organization data in the database and allow us to confidently move data between cells.

The plan is to add either project_id or namespace_id to every table that ultimately belongs to an "Organization" and can therefore be moved between Cells as a complete unit. A lot of tables already have one of these columns but we need to add and backfill this column for a few hundred tables. A lot of the migrations will be similar to this one in that we can just look up another foreign key and find the project_id from that related parent table.

This MR is the first in adding and backfilling this column to a specific table.

The packages_tags table is not very large but we still expect a batched background migration will be the safest way to backfill this table.

With a batch size of 1000 and a 2.minutes wait between updates and this table containing ~800k rows this should take around 800000/1000 * 2 minutes = 27 hours.

Database queries

Update a batch
explain WITH joined_cte(packages_tag_id, project_id) AS MATERIALIZED (
	  SELECT packages_tags.id, packages_packages.project_id FROM "packages_tags" INNER JOIN packages_packages ON packages_tags.package_id = packages_packages.id WHERE "packages_tags"."id" BETWEEN 1 AND 802962 AND "packages_tags"."project_id" IS NULL AND "packages_tags"."id" >= 100270 AND "packages_tags"."id" < 100585
	)
	UPDATE packages_tags
	SET project_id = joined_cte.project_id
	FROM joined_cte
	WHERE id = joined_cte.packages_tag_id

https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/23586/commands/75828

Screenshots or screen recordings

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #429173 (closed)

Edited by Dylan Griffith

Merge request reports