Skip to content

Convert ci_job_artifacts.id to bigint - Step 1: Add new columns and sync data

The ci_job_artifacts table is one of the largest tables in GitLab's database still uses an integer (int4) Primary Key. It has a high Primary Key Overflow risk as can be seen on its tracking issue.

The first step to address the problem is to create a new column of type bigint, load all the data by using background jobs from the current id column to it and keep them in sync with a trigger.

In addition, the ci_job_artifacts table references (has a Foreign Key towards) the ci_builds table, which is also one of the tables with a high Primary Key Overflow risk and we'll have to convert its id column to bigint, so it would be optimal to do so in one pass while converting its Primary Key.

Finally, we have to also follow the same approach for all tables that reference it (project_pages_metadata). (no need as project_pages_metadata.artifacts_archive_id is already defined as a bigint)

So the overall process will be as follows:

  • Create new column ci_job_artifacts.new_id for the Primary Key, with bigint NOT NULL DEFAULT 0
  • Create new column ci_job_artifacts.new_job_id for the Foreign Key that references ci_builds, with bigint NOT NULL DEFAULT 0
  • Install sync triggers for both columns to keep them updated while new records are inserted or existing ones are updated or deleted.
  • Start background jobs that will batch through the whole table and copy the id and job_id values to the new columns.
  • Same for the column project_pages_metadata.artifacts_archive_id that references ci_job_artifacts.id (no need as it is already a bigint)

We'll follow with a cleanup migration in the next milestone than the one that the aforementioned migrations are deployed, which will add the necessary indexes, swap the PK (and its sequence) and the FKs and finally drop the old columns.

Related Tracking Issue: #276019 (closed)


Additional request / requirement (&5658 (comment 552759319)): while swapping the columns in ci_job_artifacts, let's consider renaming the column from job_id to build_id to align with naming conventions

Edited by Yannis Roussos