Improve `PipelineProcessService`

Summary

Currently PipelineProcessService is executed a multiple times. This results in a quite an overhead on the amount of compute that it uses.

We should (my random notes of improvements of this service, we can do a lot to make it take likely 10% of current time).

  1. Debug all SQL queries being executed,
  2. De-duplicate the jobs being executed, being done by https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31370,
  3. Make pipeline processing more efficient,
  4. Serialise updates created by processing of individual jobs during the process!,
  5. Make updates of stages to be targeted to builds being triggered by process!,
  6. Ignore update of pipeline if build was retried,
  7. Be aware of when: to update only builds that are in fact affected,
  8. Remove any potential N+1, try to pre-calculate status in SQL only, or be clever to discover which builds that are "created" should be processed,
  9. We use pipeline.builds. to gather status of prior stages, this seems to be bug as we should be using pipeline.statuses (to also include bridges),
  10. Ensure that we always use .find_each,
  11. Ensure that update of stages are sequential across all concurrent runs,
  12. Remove rubocop offenses,
  13. Remove deprecated code for update_retried as we no longer need it,

Measurements

A comprehensive explanation of all of the improvements can be found here: #197930 (comment 320405586)

Roll up summary

Max Duration percentile for Pipeline Workers

  • spikes were reduced for PipelineProcessWorker, and maximum duration for both 95% and 90% percentiles are reduced by 50%. DB Duration for Pipeline Workers
  • Reduced maximum peaks for PipelineProcessWorker, 25% decrease in maximum duration value. CPUs for Pipeline Workers
  • There is a reduction in spikes, maximum for PipelineProcessWorker reduced by 18% Average Duration percentile for pipelines.json and pipelines/:id.json
  • ~13% faster for 95th percentile for pipelines.json
  • ~13% slower for 95th percentile, but 16% faster in 90th percentile for pipelines/:id.json Max Duration percentile for pipelines.json and pipelines/:id.json
  • ~77% faster with ci_composite_status for 95th percentage for pipelines.json
  • ~86% faster with ci_composite_status for 90th percentage for pipelines.json CPUs for pipelines.json and pipelines/:id.json
  • CPU's 11% better for pipelines.json, 35% better at the maximum value for pipelines.json

Accumulative duration per pipleine_id for PipelineProcessWorker

  • Average 68% faster when ci_atomic_processing was enabled
  • Maximum 75% faster when ci_atomic_processing was enabled

Max Duration percentile for Pipeline Workers

  • ~20% faster average time for PipelineProcessWorker.
  • We can see that spikes were reduced for PipelineProcessWorker, and the maximum duration for both 95% and 90% percentiles are reduced by 21%.
  • There are no significant changes in PipelineUpdateWorker and StageUpdateWorker

DB Duration for Pipeline Workers

  • 15% faster Average DB time for PipelineProcessWorker
  • No significant changes in average/maximum database times for rest of Pipeline Workers.
  • Increased maximum peak for PipelineProcessWorker, 12% increase in maximum duration value.

CPUs for Pipeline Workers

  • No significant changes in average/maximum CPU's for Pipeline Workers.

Average duration and DB duration for pipelines.json and pipelines/:id.json

  • Total duration is ~27% faster with ci_atomic_processing for pipelines.json api
  • DB duration is 20% faster with ci_atomic_processing for pipelines.json api
  • Total duration is ~32% faster with ci_atomic_processing for pipelines/:id.json api
  • DB duration is ~28% faster with ci_atomic_processing for pipelines/:id.json api

Maximum duration and DB duration for pipelines.json and pipelines/:id.json

  • Max duration is 35% faster with ci_atomic_processing for pipelines.json api
  • Max DB duration is 12% faster with ci_atomic_processing for pipelines.json api
  • Max duration is 15% faster with ci_atomic_processing for pipelines.json api
  • Max DB duration is 7% faster with ci_atomic_processing for pipelines.json api

Average Duration percentile for pipelines.json and pipelines/:id.json

  • ~27% faster for 95th percentile for pipelines.json
  • ~19% faster for 90th percentile for pipelines.json
  • ~25% faster for 50th percentile for pipelines.json
  • ~20% faster for 95th percentile for pipelines/:id.json
  • ~16% faster for 90th percentile for pipelines/:id.json
  • ~41% faster for 50th percentile for pipelines/:id.json

Max Duration percentile for pipelines.json and pipelines/:id.json

  • ~98% faster for 95th percentile for pipelines.json
  • ~93% faster for 90th percentile for pipelines.json
  • ~2% slower for 50th percentile for pipelines.json
  • ~13% faster for 95th percentile for pipelines/:id.json
  • ~1% slower for 90th percentile for pipelines/:id.json
  • ~18% slower for 50th percentile for pipelines/:id.json

CPUs for pipelines.json and pipelines/:id.json

  • CPU's ~26% better for pipelines.json, ~50% better at the maximum value for pipelines.json
  • CPU's ~31% better for pipelines.json, ~17% better at the maximum value for pipelines.json
Edited by Craig Gomes