Skip to content

Control what data is persisted in p_ci_builds_metadata.config_options

Context

As part of the work in Ensure ci_builds_metadata contains only process... (#271615 - closed) we need to ensure that we know exactly what data is persisted in config_options.

Problem

The column p_ci_builds_metadata.config_options is a JSONB column and any developers can easily persist data in there. Over the years new data has been added there without a clear strategy due to the fact that we discouraged adding more columns to p_ci_builds and p_ci_builds_metadata.

Today we have a mix of processing and intrinsic data in this column and prevents us from deleting p_ci_builds_metadata records when a pipeline is archived (jobs no longer executable).

Proposal

  • Control what gets stored in config_options with a JSON schema validation. We need to also control what gets nested inside top-level keys so the schema needs to be strict and deep.
  • Changes to the JSON schema must be gated via CODEOWNERS with member of the Verify team to approve.
  • Strategy for what defines data as processing data must be clearly described in our development docs and updates socialized with Verify teams.
  • (optional) Add Danger rule to notify MR authors on the importance of changing the schema.

We need to ensure this only impacts development and not production:

  • On validation failures we log them in Kibana/Sentry and raise an exception in development environment, so we fail hard in development but silently in production while gathering data.
  • Integrate any missing keys in the schema until we have a comprehensive and deep validation of the column.
  • Introduce the change with a feature flag.
Edited by 🤖 GitLab Bot 🤖