Explore how to partition new tables when dynamic partitions are automatically created
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem
In &11806 we have added an automated way of creating new partitions for the routing tables. This will insert new records in the ci_partitions
table and bump the writes to the next available partition when the any of the current ones reach 100GB. This is a problem for partitioning new tables in the CI domain. The partitioning strategy is designed to have all resources from a pipeline object share the same partition_id
value.(for example all all the jobs from a pipeline must have the same partition_id value as the pipeline)
To partition an existing table with zero downtime, we must add a check constraint on that table that enforces all the rows from it to have a limited set of partition_ids
. For example, when we partitioned ci_builds
we added this constraint: CONSTRAINT partitioning_constraint CHECK (partition_id = (100)::bigint)
in order to attach the table to the routing table without holding the expensive locks for too long. But when we got to partition ci_job_artifacts
we had to add this check constraint: CONSTRAINT partitioning_constraint CHECK ((partition_id = ANY (ARRAY[(100)::bigint, (101)::bigint])))
because at that time we were writing data to the second partition(ci_builds_101
) for p_ci_builds
and the ci_job_artifacts
had records with both partition_id=100
and partition_id=101
. For ci_pipelines
we'll have to add a check constraint for partition_id in (100, 101, 102)
because we have created the partitions for partition_id=102
for the current partitioned table and we plan to write to them.
But if we enable the partition manager, it will start creating new partitions as soon as a partition from the current partitioned tables reaches the size limit. Example:
- the partition manager is enabled
- the current partition_id=102
- we add
CONSTRAINT partitioning_constraint CHECK ((partition_id = ANY (100, 101, 102)))
toci_pipelines
-
ci_builds_metadata_102
reaches the threshold for switching to the next partition -
ci_builds_103
,ci_builds_metadata_103
, ..., partitions are created - the partition manager switches the writes to
partition_id=103
- now inserts into
ci_pipelines
start to fail because the partitioning check constraint is not valid.
We must wait until ci_pipelines
is partitioned and we can create partitions for the next values.
reference list
- https://docs.gitlab.com/ee/architecture/blueprints/ci_data_decay/pipeline_partitioning.html
- https://docs.gitlab.com/ee/development/database/partitioning/list.html
- https://gitlab.com/gitlab-org/gitlab/-/blob/0cae613eb088053270c2ed1d69281b872f89c696/lib/gitlab/database/partitioning/list/convert_table.rb#L25
Proposals
- Pause partition manager when a new table is being partitioned and re-enable it after partitioning is complete
- when adding the check constraint include the next X values so that the check constraint is still valid. Switching to the next partition should still work, but how do we determine a good X value?
- partition new tables by creating the table as partitioned and copy the data over.
There is also the problem with the growth rate between .com and self-managed when selecting the values for the constraint. On .com we know the values, but for self-managed we have no idea and we'll have to take them out of the database which will increase the complexity of the migrations.