Spike: Deduplicate ci_build_names into one of the deduplicated tables
Problem
ci_build_names is used today for job filtering by name. Since this is duplicate data and high-growth (due to be associated to ci_builds), we should look for ways to deduplicate it while maintaining queries efficient.
Proposal
If the build name is something we can delete after pipeline archival, the Ci::JobDefinition is a good place. Today we have retention policy for ci_build_names but this requirement may change in the future.
Otherwise we need to use Ci::JobInfo (a similar table to job definition) for intrinsic data: Create `Ci::JobInfo` model for intrinsic dedupl... (#567709 - closed)
Investigation
-
Assess if there are any columns we should move into a new model
Ci::JobInfo. Do they have to be indexed? Are they immutable? Can they be part of a jsob column likep_ci_job_definitions.config? -
Assess the cost of refactoring in terms of complexity and risks.
Expected outcomes
-
Investigation results outlining why we cannot/should not deduplicate this data, OR
-
A POC MR with the proposed implementation. We already have a POC for
Ci::JobInfo(Draft: POC Deduplicate intrinsic immutable data... (!211540)), so we could build on it for this spike issue.