StuckCiJobsWorker: One worker per status
For the StuckCiJobsWorker if any of the select queries fail or timeout then it stops the execution of the whole job.
Instead, we can have the StuckCiJobsWorker spin-off, additional workers, for each build status ('pending', 'running', 'scheduled'). In addition to making this more resilient to timeout failures, this will make the code more resilient to all other possible failure scenarios.
There is a POC here: !64635 (closed)
Required Merge Requests
-
Set up resuable parts for splitting up StuckCiJobsWorker
!69564 (merged) and !69574 (merged) -
Introduce separate DropRunningWorker
andDropRunningService
!70233 (merged) -
Introduce separate DropScheduledWorker
andDropScheduled
service in exactly the same way asDropRunning*
!71229 (merged)
Add-on changes
See other Related merge requests
Observability
Edited by drew stachon