Reject MR early on individual CI job failures
Part of: https://gitlab.com/groups/tezos/-/milestones/6
I just thought of a small optimisation we could apply either to marge-bot or to our CI configuration. If we assign marge to a job and one job fails early (say misc_checks
) but a bunch of out-of-order jobs continue to run (the test suit, opam test etc) like in this pipeline, then there is no really no point in continuing the pipeline.
We could then have marge-bot query not just whether the CI pipeline is successfully terminated or not, but whether it contains failed jobs. If it contains failed jobs at any point before termination, marge-bot can just move on to the next MR (and optionally cancel the pipeline).
We need to handle:
- retries (but they are not included in the list of failed jobs by default anyhow)
- allowed_failures (but IIUC the API for pipeline jobs states whether a failure was allowed or not)
Concretely, it would work like this:
- in job.py :: MergeJob::wait_for_ci_to_pass
change
if ci_status not in ("pending", "running"):
log.warning("Suspicious CI status: %r", ci_status)
to something like:
if ci_status not in "running":
get failed jobs from [current_pipeline] (note that
through `scope`, you can get only failed jobs
https://docs.gitlab.com/ee/api/jobs.html#list-pipeline-jobs)
if there are failed jobs and one of them does not have
"allow_failure set"
then raise CannotMerge(f"Early CI failure detected: {early_failure_info}!")
where early_failure_info contains e.g. what the jobs
that were detected as failures (ideally also with
links to them)
if ci_status not in "pending":
log.warning("Suspicious CI status: %r", ci_status)
Edited by Arvid Jakobsson