Draft: POC Deduplicate intrinsic data to Ci::JobInfo & dedup more processing data into Ci::JobDefinition

What does this MR do and why?

This is a POC for:

  • Moving immutable, intrinsic data into a new model Ci::JobInfo.
  • Moving more job processing data into Ci::JobDefinition.

Details:

  1. Creates ci_job_infos and the association table ci_job_info_instances.

    • The Ci::JobInfo model mirrors Ci::JobDefinition.
    • Costs/benefits:
      • Having an association table allows us to avoid adding an indexed FK column to ci_builds but takes up more space than the latter.
      • However, the extra data we're storing in both tables will be overall less than the combined storage savings from the following items.
  2. Moves ci_builds.scheduling_type and ci_builds.name --> ci_job_infos as normalized columns.

    • Attempted moving other columns from ci_builds and these were the results:
    • Costs/benefits:
      • Deduplicates scheduling_type value.
      • We can eventually drop ci_builds.scheduling_type column.
  3. Adds search_vector column to ci_job_infos.

    • This combined with copying over ci_builds.name to ci_job_infos replaces the need for ci_build_names.
    • Costs/benefits:
      • Deduplicates name and search_vector values.
      • We can eventually drop ci_build_names.
  4. Moves ci_build_needs (names only) --> ci_job_infos.config.

    • This is long term data currently labeled intrinsic_job_needs. We only need to keep the names because that's all the FE requires to generate the Pipeline dependencies graph.
    • Costs/benefits:
      • The data is temporarily partially duplicated into ci_job_infos while the data still exists in ci_build_needs.
      • However, we can drop ci_build_needs data after pipeline archival, with only the partial, deduplicated data remaining in ci_job_infos.
  5. Moves ci_build_needs (full data) --> ci_job_definitions.config.

    • Instead of reading job.needs directly, we read needs_attributes from job_definition.config and represent it as a collection of needs using Ci::JobNeeds::Collection. This allows us to work with the data in a readable and consistent way.
    • The only potential blocker is explained in !211540 (comment 2941378694).
    • Costs/benefits:
      • Requires more data processing on the DB side as described in !211540 (comment 2941378694).
      • Deduplicates needs data.
      • We can eventually drop ci_build_needs.

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #577211 (closed)

Edited by Leaminn Ma

Merge request reports

Loading