Fail builds which are waiting for resource after a certain period
In some cases, pipeline jobs could be stuck at
waiting_for_resource status, for example, when
.enqueue failed even after the build successfully retained a resource.
We should drop
waiting_for_resource builds after some period. We can use
StuckCiJobsWorker in such case.
- Add a configuration for the settings for timeout similar to the job timeout setting
- Setting name should be named
resource timeout, it should be located under Settings->General->Resource
waiting_for_resourceexceeds the time configured the job should be terminated.
- default timeout shall be 1 hour.
StuckCiJobsWorkerto cancel long waiting jobs.
resource_timeout(integer, unit: minute, minimum: 60(an hour), maximum: 10080(a week)) column into
- The value can be configured in UI, likely setting page.
StuckCiJobsWorker, we search the target rows with the following query.
SELECT * FROM ci_builds LEFT JOINS project_ci_cd_settings ON project_ci_cd_settings.project_id = ci_builds.project_id WHERE status = 'waiting_for_resouce' AND waiting_for_resource_at + (project_ci_cd_settings.resource_timeout * interval '1 minute') > now()
and drops the builds with
- Since we depend on cron worker and the interval is
0 * * * *(every hour), the timed out jobs won't be dropped at the exact timing but when the worker is fired.
resource_timeoutconfig for resource group
StuckCiJobsWorkerto drop long waiting builds after the configured period.
- Publish the feature (Documentation and remove a feature flag)