Explore how to partition new tables when dynamic partitions are automatically created

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem

In &11806 we have added an automated way of creating new partitions for the routing tables. This will insert new records in the ci_partitions table and bump the writes to the next available partition when the any of the current ones reach 100GB. This is a problem for partitioning new tables in the CI domain. The partitioning strategy is designed to have all resources from a pipeline object share the same partition_id value.(for example all all the jobs from a pipeline must have the same partition_id value as the pipeline)

To partition an existing table with zero downtime, we must add a check constraint on that table that enforces all the rows from it to have a limited set of partition_ids. For example, when we partitioned ci_builds we added this constraint: CONSTRAINT partitioning_constraint CHECK (partition_id = (100)::bigint) in order to attach the table to the routing table without holding the expensive locks for too long. But when we got to partition ci_job_artifacts we had to add this check constraint: CONSTRAINT partitioning_constraint CHECK ((partition_id = ANY (ARRAY[(100)::bigint, (101)::bigint]))) because at that time we were writing data to the second partition(ci_builds_101) for p_ci_builds and the ci_job_artifacts had records with both partition_id=100 and partition_id=101. For ci_pipelines we'll have to add a check constraint for partition_id in (100, 101, 102) because we have created the partitions for partition_id=102 for the current partitioned table and we plan to write to them.

But if we enable the partition manager, it will start creating new partitions as soon as a partition from the current partitioned tables reaches the size limit. Example:

  • the partition manager is enabled
  • the current partition_id=102
  • we add CONSTRAINT partitioning_constraint CHECK ((partition_id = ANY (100, 101, 102))) to ci_pipelines
  • ci_builds_metadata_102 reaches the threshold for switching to the next partition
  • ci_builds_103, ci_builds_metadata_103, ..., partitions are created
  • the partition manager switches the writes to partition_id=103
  • now inserts into ci_pipelines start to fail because the partitioning check constraint is not valid.

We must wait until ci_pipelines is partitioned and we can create partitions for the next values.

reference list

Proposals

  • Pause partition manager when a new table is being partitioned and re-enable it after partitioning is complete
  • when adding the check constraint include the next X values so that the check constraint is still valid. Switching to the next partition should still work, but how do we determine a good X value?
  • partition new tables by creating the table as partitioned and copy the data over.

There is also the problem with the growth rate between .com and self-managed when selecting the values for the constraint. On .com we know the values, but for self-managed we have no idea and we'll have to take them out of the database which will increase the complexity of the migrations.

Edited by 🤖 GitLab Bot 🤖