CI partition rebalancing

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Description

A recent production incident gitlab-com/gl-infra/production#8204 (closed) resulted in more than 1M rows persisted in our CI database with a new partition id: 101.

In #387301 (closed) we explored the way to update the erroneously assigned partition to number 100. If we succeed with updating foreign keys to include the ON UPDATE statement, it is possible that migrating all pipeline resources to another partition will be a simple UPDATE SET partition_id = NEW_ID WERE ci_pipelines.id = X SQL statement.

That being said, we may still need to move millions of rows, what can put a strain on the database.

Proposal

Let's use this issue to explore the ways to perform partition relabalancing in a safe and predictable manner. One possible solution is to use background migrations to run the UPDATE SET partition_id = ? in batches by trying to estimate the number of cascading updates.

Spike Outcome

  • Decide the strategy to use
  • Populate &11822 with issue to achieve the rebalancing

/cc @mbobin @morefice @shampton @carolinesimpson

Edited by 🤖 GitLab Bot 🤖