Control what data is persisted in `p_ci_builds_metadata.config_options`

Context

As part of the work in Ensure ci_builds_metadata contains only process... (#271615 - closed) we need to ensure that we know exactly what data is persisted in config_options.

Problem

The column p_ci_builds_metadata.config_options is a JSONB column and any developers can easily persist data in there. Over the years new data has been added there without a clear strategy due to the fact that we discouraged adding more columns to p_ci_builds and p_ci_builds_metadata.

Today we have a mix of processing and intrinsic data in this column and prevents us from deleting p_ci_builds_metadata records when a pipeline is archived (jobs no longer executable).

Proposal

Control what gets stored in config_options with a JSON schema validation. We need to also control what gets nested inside top-level keys so the schema needs to be strict and deep.
Changes to the JSON schema must be gated via CODEOWNERS with member of the Verify team to approve.
Strategy for what defines data as processing data must be clearly described in our development docs and updates socialized with Verify teams.
(optional) Add Danger rule to notify MR authors on the importance of changing the schema.

We need to ensure this only impacts development and not production:

On validation failures we log them in Kibana/Sentry and raise an exception in development environment, so we fail hard in development but silently in production while gathering data.
Integrate any missing keys in the schema until we have a comprehensive and deep validation of the column.
Introduce the change with a feature flag.

Edited Jun 24, 2025 by 🤖 GitLab Bot 🤖

Control what data is persisted in p_ci_builds_metadata.config_options

Context

Problem

Proposal

Control what data is persisted in `p_ci_builds_metadata.config_options`