Reduce redundant pipeline cache expirations in state machine transition

What does this MR do and why?

Context

In gitlab-com/gl-infra/production#21562 (closed) we found that ExpirePipelineCacheService is running ~400 RPS against the CI primary database (dashboard). We need to reduce this frequency.

We know that ExpirePipelineCacheService is currently called in two places when a pipeline status changes: inline at the end of AtomicProcessingService#process!, and again via a worker enqueued from the pipeline model's after_transition state machine hook. These two calls appear to be redundant.

This MR

This MR skips the redundant state machine cache expiry call when the status change originates from AtomicProcessingService, which already handles cache expiration inline. The basic approach is that we pass an argument to the transition via set_status to tell it to skip cache expiry.

It also adds logging so that we can determine the % of cache expirations skipped/not-skipped via this code path.

This is gated behind a feature flag: ci_skip_redundant_pipeline_cache_expiration

Roll-out issue: #596729

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #594454 (closed)

Edited by Leaminn Ma

Merge request reports

Loading