Skip to content

Prevent auto-retry AccessDenied error from stopping transition to failed

Tomasz Maczukin requested to merge fix-ci-job-auto-retry into master

What does this MR do?

Fixes a bug, when Gitlab::Access::AccessDeniedError raised in Ci::RetryBuildService prevents job transition to failed state.

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

If a job is configured to be auto-retried, then job.drop! automatically calls the .retry() method. Usually this will be OK. But if there happened something that disallows user to retry a particular job (e.g. user lost access to the project) since the original job was started, Ci::RetryBuildService will raise a Gitlab::Access::AccessDeniedError exception.

It becomes a problem, when such job is planned to be dropped by StuckCiJobsWorker. With this worker we're looking for jobs that we treat as stuck and we try to drop them in batches. If at least one of the jobs is configured to be auto-retried, and it's in a state that causes Gitlab::Access::AccessDeniedError to be raised from retry service, the whole batch will be rolled-back by the DB transaction. This eventually prevents StuckCiJobsWorker from cleaning stale and stuck jobs.

Pleas look into gitlab-com/infrastructure#3866 for a live example of such situation that happened on GitLab.com

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Edited by Grzegorz Bizon

Merge request reports