Spike: Deduplicate intrinsic immutable data from ci_builds to Ci::JobInfo

Problem

In Introduce `Ci::JobDefinition` model (#551830 - closed) we introduce a new model to store deduplicated job data. In Ci::JobDefinition we store only immutable processing data so that we can:

  1. Easily deduplicate it.
  2. Easily drop (the old partitions) after pipelines are archived.

We could not deduplicate immutable intrinsic (long-term) data such as job names, needs, sources, etc. There is a great opportunity to leverage the lessons learned from Ci::JobDefinition and apply the same pattern and refactoring steps for a new model that stores immutable intrinsic data.

Proposal

We can introduce a new module (proposed name so far) called Ci::JobInfo to store immutable intrinsic data. Data must be immutable in order to be deduplicated at creation time and never updated. For any "mutable" data we have ci_builds.

Refer to Discussion: Where should all the columns of ci_... (#520538) for what data currently in ci_builds makes sense to be deduplicated.

We should also look at deduplicating data from other CI tables:

Investigation

  1. Assess if there are any columns we should move into a new model Ci::JobInfo. Do they have to be indexed? Are they immutable? Can they be part of a jsob column like p_ci_job_definitions.config?

  2. Assess the cost of refactoring in terms of complexity and risks.

Expected outcomes

  1. Investigation results outlining why we cannot/should not deduplicate this data, OR

  2. A POC MR with the proposed implementation.

Edited by Leaminn Ma