Spike: Deduplicate ci_build_needs

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem

ci_build_needs table stores a records for every needs entry for every job. There is a huge potential in deduplicating this data

Proposal

Evaluate whether needs could be stored in Ci::JobDefinition but take in consideration that job definitions could be deleted for archived pipelines/partitions.

Maybe we need a similar model for immutable/deduplicatable data that is intrinsic (long term). The need for this model has been identified already. See Spike: Deduplicate `ci_build_sources` (#565806) for example.

Investigation

  1. Assess if there are any columns we should move into a new model Ci::JobInfo. Do they have to be indexed? Are they immutable? Can they be part of a jsob column like p_ci_job_definitions.config?

  2. Assess the cost of refactoring in terms of complexity and risks.

Expected outcomes

  1. Investigation results outlining why we cannot/should not deduplicate this data, OR

  2. A POC MR with the proposed implementation. We already have a POC for Ci::JobInfo (Draft: POC Deduplicate intrinsic immutable data... (!211540)), so we could build on it for this spike issue.

Edited by Leaminn Ma