Skip to content

Ensure ci_builds_metadata contains only processing data

Overview

ci_builds_metadata currently contains many columns. We should ensure that data in all of them are safe to get deleted after a build gets archived.

We have work in-progress to clear data from some of the columns discussion: Delete Ci::BuildMetadata after Ci::... (#538031 - closed) which will help with space savings, but there's a decent amount of operational efficiency to be gained if we can simply delete the row, instead of updating a few columns ot null.

Columns

Column Type Nullable? Mutable?

Removable on archive? #538031 (closed)

Where to move?
id integer no N/A

🗑️ Delete whole record

N/A
build_id integer no N/A (we may deduplicate immutable data across builds)

🗑️ Delete whole record

N/A
project_id integer no N/A

🗑️ Delete whole record

N/A
partition_id integer no N/A

🗑️ Delete whole record

N/A
timeout integer yes yes (when job picked by runner)

#538183 (closed) intrinsic

ci_builds

timeout_source integer yes yes (when job picked by runner)

#538183 (closed) intrinsic

ci_builds

interruptible boolean

no (default true)

no

⚠️ while data is not mutable and can be removed after pipeline is archived, the data needs to be indexed.

  • ci_job_prototypes and indexed there as dedicated column.
config_options jsonb yes ?

⚠️ see below

ci_job_prototypes

config_variables jsonb yes no

🗑️

ci_job_prototypes

has_exposed_artifacts boolean

yes. We care if it's true

no

Consider removing the column in favor of filtering by artifacts:expose_as.

See also #545486 (closed) where we need to retain artifacts:expose_as, maybe into p_ci_builds. This should be sufficient.

environment_auto_stop_in character varying(255)

⚠️ groupenvironments #545659 (closed) - Removable only after data migrated to environments

!194402 (closed) being moved to environments

expanded_environment_name character varying(255) yes no

groupenvironments #545659 (closed)

ci_builds (but attempt refactor to make it processing data)

secrets jsonb yes no

🗑️ #538252 (closed) can be merged with processing data.

ci_job_prototypes

id_tokens jsonb yes no

🗑️ #538251 (closed) can be merged with processing data.

ci_job_prototypes

debug_trace_enabled boolean

no (default false)

yes

intrinsic data

ci_builds

exit_code smallint yes yes

intrinsic

ci_builds

Top-level keys found in config_options

As of 2025-05-23:

[ gprd ] production> Ci::BuildMetadata.select(:config_options).last(300_000).flat_map { |md| md.config_options.keys }.uniq.sort

NOTE: Ideally intrinsic data should be moved to a table that best represents the data. However, due to urgency, we could introduce a column in p_ci_builds that is nullable and not indexed. For example if artifacts:expose_as is intrinsic data (non processing), we could introduce p_ci_builds.artifacts_expose_as as jsonb and move the data in there when pipeline is archived or new jobs created.

Top-level key Nullable? Mutable? Removable on archive? Where to move?

after_script:

yes no

🗑️

ci_job_prototypes

allow_failure_criteria:

yes no

🗑️

ci_job_prototypes

artifacts:

yes no

🔒

⚠️

@fabiopitino: artifacts:expose_as is used when has_exposed_artfacts: true

this should be considered intrinsic data.

Consider creating a dedicated table given the low usage of this feature which may help us deprecating it if needed.

Alternatively, if stored in ci_job_artifacts

before_script:

yes no

🗑️

ci_job_prototypes

bridge_needs:

yes no

🗑️

ci_job_prototypes

cache:

yes no

🗑️

ci_job_prototypes

cross_dependencies:

yes no

🗑️

ci_job_prototypes

dast_configuration:

yes no

🗑️

ci_job_prototypes

dependencies:

yes no

🗑️

ci_job_prototypes

downstream_errors:

yes yes

🔒

ci_builds - set when bridge job runs and downstream pipeline fails without being persisted

enqueue_immediately:

yes yes

🗑️

Moving to Redis

environment:

yes

🔒 groupenvironments

⚠️ #545659 (comment 2555709370)

execution_policy_job:

yes no

🗑️groupsecurity policies

ci_job_prototypes or moved to dedicated table

execution_policy_variables_override:

yes no

🗑️ groupsecurity policies

ci_job_prototypes or moved to dedicated table

execution_policy_name

yes no

🗑️ groupsecurity policies

ci_job_prototypes or moved to dedicated table

hooks:

yes

🗑️

identity:

yes

🗑️ grouprunner

ci_job_prototypes

image:

yes no

🗑️

ci_job_prototypes

instance:

yes no

🗑️

ci_job_prototypes

job_timeout:

yes no

🗑️

ci_job_prototypes

manual_confirmation:

yes no

🗑️

ci_job_prototypes

pages:

yes no

🗑️ groupknowledge

ci_job_prototypes

parallel:

yes no

🗑️

ci_job_prototypes

publish:

yes no

🗑️ groupknowledge

ci_job_prototypes

release:

yes

🗑️ groupenvironments

Likely isn't used after the release is created. See #545486 (comment 2547683632).

resource_group_key:

yes no

🗑️

ci_job_prototypes

retry:

yes no

🗑️

ci_job_prototypes

scoped_user_id:

yes no

🗑️ groupauthorization

ci_job_processing - It's processing data. There might be some UX to review because a job that requires scoped_user_id could fail with insufficient permissions error. However, this should not be runnable anyway if archived regardless of the permissions.

script:

yes no

🗑️

ci_job_prototypes

services:

yes no

🗑️

ci_job_prototypes

start_in:

yes no

🗑️

ci_job_prototypes

trigger:

yes no

🗑️

ci_job_prototypes

Edited by Fabio Pitino