Consider removing old processing data for CI jobs

In #552069 (closed) we added 8 background migrations to copy existing data into a different storage layout that is more optimized for space efficiency. Today(2025-12-05) these are the stats for the migrations:

Migration ID Status Table Name Progress Started At Time Required to Complete Partition Active Since Estimated Rows
3000525 Active gitlab_partitions_dynamic.ci_builds_107 3.72% Fri, 21 Nov 2025 12 days 3 hours 39 minutes 37 seconds 2025-10-07 571,794,000
3000521 Active gitlab_partitions_dynamic.ci_builds_103 26.24% Fri, 21 Nov 2025 25 days 16 hours 55 minutes 15 seconds 2025-04-24 391,671,400
3000523 Active gitlab_partitions_dynamic.ci_builds_105 17.37% Fri, 21 Nov 2025 29 days 16 hours 19 minutes 42 seconds 2025-07-21 396,431,900
3000524 Active gitlab_partitions_dynamic.ci_builds_106 16.09% Fri, 21 Nov 2025 1 month 2 days 4 hours 34 minutes 39 seconds 2025-08-29 415,012,700
3000522 Active gitlab_partitions_dynamic.ci_builds_104 16.25% Fri, 21 Nov 2025 1 month 4 days 4 hours 46 minutes 34 seconds 2025-06-05 453,581,000
3000519 Paused gitlab_partitions_dynamic.ci_builds_101 2.79% Fri, 21 Nov 2025 3 months 9 days 19 hours 54 minutes 6 seconds 2023-12-07 1,255,724,500
3000520 Paused gitlab_partitions_dynamic.ci_builds_102 1.39% Fri, 21 Nov 2025 6 months 18 days 5 hours 17 minutes 42 seconds 2024-06-18 2,454,782,500
3000518 Paused gitlab_partitions_dynamic.ci_builds 1.35% Fri, 21 Nov 2025 7 months 3 days 12 hours 46 minutes 19 seconds 2014-12-11 4,774,979,600

The system can run only 4 migrations in parallel, so we paused the ones that target the old data and let it execute on the newer ones instead.

These estimates are very very optimistic since they assume that the migration would be running non-stop, but there are pauses in the system due to scheduling, vacuums, etc. So it this case it would take around one year to complete them and remove the p_ci_builds_metadata table.

Proposal

In !214248 (merged) we added configuration options for these migrations to skip certain data based on when the job was created.

DATABASE_CI_JOBS_PROCESSING_DATA_CUTOFF is controlling the data that's needed for job execution, for example script, yaml variables and job tags. By setting this variable we will skip over old rows that might not need to be re-executed in the future and the job would appear as archived in the UI.

This setting affects:

  • playing manual jobs for data that was not copied
  • retrying old jobs

Currently when an old job is retried and it doesn't use the new storage format, we convert it before executing it, it doesn't need to wait for the migration. So if there are any users that are retrying old jobs and have retried a job starting since 18.5, it was migrated to the new format and will not be affected by this migration.

Setting DATABASE_CI_JOBS_PROCESSING_DATA_CUTOFF='1 year' would skip creating definition rows from the 100, 101, 102(half?) partitions.

This is different from https://gitlab.com/groups/gitlab-org/-/work_items/18256 because we are still copying user data out p_ci_builds_metadata(like environments, exposed artifacts, debug flag, exit codes). And if users have retried the jobs or will be retying the jobs until the metadata tables will removed, the jobs are going to be converted to the new format and they will still be able to retry them after the tables are gone.

Edited by Marius Bobin