Backfill p_ci_pipeline_iids with existing iids from p_ci_pipelines
What does this MR do and why?
Background
We currently have the unique index (project_id, iid, partition_id) on the table p_ci_pipelines. This ensures uniqueness within the same partition but not across partitions. Due to a possible workflow bug/race condition, we occasionally save duplicate pipeline iids for the same project, which causes problems such as https://gitlab.com/gitlab-com/request-for-help/-/issues/2855#note_2518831537.
We've explored a couple different possible solutions (https://gitlab.com/gitlab-org/gitlab/-/issues/545167#note_2802709422, https://gitlab.com/gitlab-org/gitlab/-/issues/545167#note_2844491364), and we ultimately decided that the only way to guarantee Pipeline iid uniqueness on the database level is to create a new table to track the iids. (See proposed approach in !210789 (comment 2869213142).)
- In !213065 (merged), we introduced the static hash partitioned table
p_ci_pipeline_iids. It has 64 partitions with the hash onproject_idand primary key on(project_id, iid). - In !213457 (merged), we added database triggers on
p_ci_pipelinesso that it keep track of iids inp_ci_pipeline_iids.- This means that further duplicates are now prevented for new iids going forward.
- In !213992, we are updating the internal ID
initial_valuelogic to readmaximum(:iid)fromci_pipeline_iidsinstead ofci_pipelines. <-- Requires this BBM to complete first before we enable the FF.
This MR
In this MR, we now backfill p_ci_pipeline_iids with existing iids from p_ci_pipelines so that all iids are accounted for. The BBM logic is as follows:
- Queues a separate BBM per partition from
p_ci_pipelines. - Copies over
p_ci_pipelines.iidstop_ci_pipeline_iids, ignoring duplicates and null iids.
Next steps:
- Create a BBM to fix existing duplicate iids.
References
- Solves the next step in https://gitlab.com/gitlab-org/gitlab/-/issues/545167+. See implementation table.
How to set up and validate locally
- Ensure you have some pipeline data in your local gdk. Clear any data currently in
p_ci_pipelinesand observe the count of unique iids in each table:
Ci::PipelineIid.delete_all
Ci::Pipeline.select('DISTINCT (project_id, iid)').where.not(iid: nil).count # Should be non-zero
Ci::PipelineIid.count # Should be 0
- Run the migration:
bundle exec rails db:migrate. - Check that a BBM for each partition is queued in the CI database. Example
% gdk psql
gitlabhq_development=# \c gitlabhq_development_ci
gitlabhq_development_ci=# SELECT id, created_at, job_class_name, table_name, job_arguments FROM batched_background_migrations WHERE job_class_name = 'BackfillPCiPipelineIids';
id | created_at | job_class_name | table_name | job_arguments
----+------------------------------+-------------------------+----------------------------------------+-----------------------------------
18 | 2025-11-24 18:49:48.50871+00 | BackfillPCiPipelineIids | gitlab_partitions_dynamic.ci_pipelines | ["partition_id", [100, 101, 102]]
(1 row)
- After the BBMs are complete (which should be fast on your local gdk), check the counts of both tables and they should now be equal.
Ci::Pipeline.select('DISTINCT (project_id, iid)').where.not(iid: nil).count
Ci::PipelineIid.count
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.