Enhance allow_failure to capture failure reasons
Release notes
Allow_failure keyword will prevent a failed job to fail an entire pipeline, previously allow_failure accepts only binary value (true or false), in this release, we enhanced the allow_failure keyword to accept different exit code of a job, this will allow you to have better control on when to prevent the pipeline to fail base on the job exits codes.
Problem to solve
We've had an internal meeting about how to proceed on #16733 (closed) and we feel like the issue is too broad and both of the proposals mentioned in there are complex and will affect the simplicity/reliability of the whole system. We should keep how we mark a build as failed or successful as simple as possible and if we can make it even more simple and even more reliable. But we should not change it in a way that it makes it more brittle and error prone.
The overall conclusion was to start with something simple that augments allow_failure
to accept a list of exit codes for which the job will be marked as passed with warnings
. Extra functionality can be built on top of this, like updating retry
to use those reasons for its when
policy.
Recording: https://www.youtube.com/watch?v=DiicryT-W9M
Intended users
User experience goal
Proposal
Implement exit_codes: integer or array
for allow_failure:
:
test_job:
script:
- execute_script_that_will_fail
# if the script exit code is 137 or 255 the job will allow to be failed and the pipeline will continue to run
allow_failure:
exit_codes: # User defined exit code
- 137
- 255
test_job:
script:
- execute_script_that_will_fail
allow_failure:
exit_codes: 137
- Change the API to accept the exit code from the Runner
- Change the Runner to send the exit code back to Rails
- Implement YAML syntax changes
- Use the exit code to flag the job as allowed to fail.