AtomicProcessingService - Log when `needs_processing?` does not match `new_collection.processing_jobs.any?`
What does this MR do and why?
Context
The needs_processing? query in AtomicProcessingService is being executed at a very high rate. It gets executed in AtomicProcessingService twice: at the start and at the end. We seek to reduce this call rate to avoid potential LWlock contention (#598584).
This MR addresses one of the proposed changes, which is to use the new status_collection we already evaluate for new_alive_jobs and check new_collection.processing_jobs.any? in place of the last needs_processing?.
This new_collection is only evaluated in new_alive_jobs when there were stopped jobs at the beginning of processing. We expect this to be the majority of cases, so this means that we could potentially near-eliminate the last needs_processing? query.
This MR
Before we proceed with this change, we'd like to ensure that the two queries, pipeline.needs_processing? and new_collection.processing_jobs.any?, always yield the same result. In theory, they should because the queries are essentially the same (details in #598584 (comment 3336123520).) However, since this service is critical to CI, we want to take an extra precaution.
So in this MR, we update AtomicProcessingService to log whenever the two queries yield different values. We will monitor it for at least a few days.
We expect no discrepancies except in the rare case where a job transition occurs (e.g. retrying job) right between when @new_collection and the second pipeline.needs_processing? are evaluated. This would be a very small window. Even if we get relatively few logs from this scenario, it should still be okay to proceed with the change because both queries happen outside of the lease, and the job transition itself reschedules the PipelineProcessWorker.
These changes are behind the feature flag: ci_atomic_processing_log_check_mismatch. Roll-out: #600063
References
- Preliminary step to resolve part of Investigate and improve query 36880235871038157... (#598584)
- Roll-out issue: [FF] `ci_atomic_processing_log_check_mismatch` ... (#600063)
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #598584