Skip to content

StuckCiJobsWorker: One worker per status

For the StuckCiJobsWorker if any of the select queries fail or timeout then it stops the execution of the whole job.

Instead, we can have the StuckCiJobsWorker spin-off, additional workers, for each build status ('pending', 'running', 'scheduled'). In addition to making this more resilient to timeout failures, this will make the code more resilient to all other possible failure scenarios.

There is a POC here: !64635 (closed)

Required Merge Requests

Add-on changes

See other Related merge requests

Observability

  • Database query counts and timings in Kibana
  • StuckCiJobWorker timeouts in Sentry
  • Ci::StuckBuilds::DropRunningWorker timeouts in Sentry
  • Ci::StuckBuilds::DropScheduledWorker timeouts in Sentry
Edited by drew stachon