Draft: POC Deduplicate intrinsic data to Ci::JobInfo & dedup more processing data into Ci::JobDefinition
What does this MR do and why?
This is a POC for:
- Moving immutable, intrinsic data into a new model
Ci::JobInfo. - Moving more job processing data into
Ci::JobDefinition.
Details:
-
Creates
ci_job_infosand the association tableci_job_info_instances.- The
Ci::JobInfomodel mirrorsCi::JobDefinition. - Costs/benefits:
- Having an association table allows us to avoid adding an indexed FK column to
ci_buildsbut takes up more space than the latter. - However, the extra data we're storing in both tables will be overall less than the combined storage savings from the following items.
- Having an association table allows us to avoid adding an indexed FK column to
- The
-
Moves
ci_builds.scheduling_typeandci_builds.name-->ci_job_infosas normalized columns.- Attempted moving other columns from
ci_buildsand these were the results:-
--> Cannot move. This field is mutated when job transitions to failed.allow_failure -
--> Cannot move. This field is mutated in Ci::ProcessBuildService.when scheduling_type-
--> Indexed column; impractical to move.stage_idx -
name--> Indexed column; impractical to move. Decided to keep it inci_buildsfor now but still copy data over toci_job_infosto dropci_build_names. See !211540 (comment 2878147699).
-
- Costs/benefits:
- Deduplicates
scheduling_typevalue. - We can eventually drop
ci_builds.scheduling_typecolumn.
- Deduplicates
- Attempted moving other columns from
-
Adds
search_vectorcolumn toci_job_infos.- This combined with copying over
ci_builds.nametoci_job_infosreplaces the need forci_build_names. - Costs/benefits:
- Deduplicates
nameandsearch_vectorvalues. - We can eventually drop
ci_build_names.
- Deduplicates
- This combined with copying over
-
Moves
ci_build_needs(names only) -->ci_job_infos.config.- This is long term data currently labeled
intrinsic_job_needs. We only need to keep the names because that's all the FE requires to generate the Pipeline dependencies graph. - Costs/benefits:
- The data is temporarily partially duplicated into
ci_job_infoswhile the data still exists inci_build_needs. - However, we can drop
ci_build_needsdata after pipeline archival, with only the partial, deduplicated data remaining inci_job_infos.
- The data is temporarily partially duplicated into
- This is long term data currently labeled
-
Moves
ci_build_needs(full data) -->ci_job_definitions.config.- Instead of reading
job.needsdirectly, we readneeds_attributesfromjob_definition.configand represent it as a collection of needs usingCi::JobNeeds::Collection. This allows us to work with the data in a readable and consistent way. - The only potential blocker is explained in !211540 (comment 2941378694).
- Costs/benefits:
- Requires more data processing on the DB side as described in !211540 (comment 2941378694).
- Deduplicates needs data.
- We can eventually drop
ci_build_needs.
- Instead of reading
References
- Spike: Deduplicate intrinsic immutable data fro... (#577211 - closed)
- Spike: Deduplicate `ci_build_names` into one of... (#567704 - closed)
- Spike: Deduplicate `ci_build_needs` (#565821)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #577211 (closed)
Edited by Leaminn Ma