Enhance allow_failure to capture failure reasons
Allow_failure keyword will prevent a failed job to fail an entire pipeline, previously allow_failure accepts only binary value (true or false), in this release, we enhanced the allow_failure keyword to accept different exit code of a job, this will allow you to have better control on when to prevent the pipeline to fail base on the job exits codes.
Problem to solve
We've had an internal meeting about how to proceed on #16733 and we feel like the issue is too broad and both of the proposals mentioned in there are complex and will affect the simplicity/reliability of the whole system. We should keep how we mark a build as failed or successful as simple as possible and if we can make it even more simple and even more reliable. But we should not change it in a way that it makes it more brittle and error prone.
The overall conclusion was to start with something simple that augments
allow_failure to accept a list of exit codes for which the job will be marked as
passed with warnings. Extra functionality can be built on top of this, like updating
retry to use those reasons for its
User experience goal
exit_codes: integer or array for
test_job: script: - execute_script_that_will_fail # if the script exit code is 137 or 255 the job will allow to be failed and the pipeline will continue to run allow_failure: exit_codes: # User defined exit code - 137 - 255
test_job: script: - execute_script_that_will_fail allow_failure: exit_codes: 137
- Change the API to accept the exit code from the Runner
- Change the Runner to send the exit code back to Rails
- Implement YAML syntax changes
- Use the exit code to flag the job as allowed to fail.