Control what data is persisted in p_ci_builds_metadata.config_options
Context
As part of the work in Ensure ci_builds_metadata contains only process... (#271615 - closed) we need to ensure that we know exactly what data is persisted in config_options
.
Problem
The column p_ci_builds_metadata.config_options
is a JSONB column and any developers can easily persist data in there. Over the years new data has been added there without a clear strategy due to the fact that we discouraged adding more columns to p_ci_builds
and p_ci_builds_metadata
.
Today we have a mix of processing and intrinsic data in this column and prevents us from deleting p_ci_builds_metadata
records when a pipeline is archived (jobs no longer executable).
Proposal
- Control what gets stored in
config_options
with a JSON schema validation. We need to also control what gets nested inside top-level keys so the schema needs to be strict and deep. - Changes to the JSON schema must be gated via CODEOWNERS with member of the Verify team to approve.
- Strategy for what defines data as processing data must be clearly described in our development docs and updates socialized with Verify teams.
- (optional) Add Danger rule to notify MR authors on the importance of changing the schema.
We need to ensure this only impacts development and not production:
- On validation failures we log them in Kibana/Sentry and raise an exception in development environment, so we fail hard in development but silently in production while gathering data.
- Integrate any missing keys in the schema until we have a comprehensive and deep validation of the column.
- Introduce the change with a feature flag.
Edited by 🤖 GitLab Bot 🤖