Smarter KAS and Gitaly components version update pipeline
Context
We have a pipeline to regularly update KAS and Gitaly versions (e.g.). It create a new MR, approve and wait for the MR pipeline to success to merge. Currently, sometimes it fails because the MR pipeline fails. (I haven't check if any logics were already implemented). I suggest the logic should be smart to do some retries around the MR pipeline:
Proposal
Previous proposal
- If the pipeline is older than 18 hours
- Success pipeline: trigger a new pipeline (if not, danger-bot would fail)
- Failed pipeline: error
- Otherwise (or pick another reasonable pipeline duration, like < 8 hours):
- If the pipeline is still running, wait
- If the pipeline failed, retry failed jobs
-
Start an MR pipeline if one hasn't started automatically (example run of components:update_gitalywhere this occurred: https://ops.gitlab.net/gitlab-org/release/tools/-/jobs/18550767) -
Retry failed jobs in the latest MR pipeline -
Retry failed jobs upto a limit (3?) so that we don't keep retrying when retries will not result in success.
-
-
Start a new pipeline if latest pipeline was created more than 8 hours ago. -
Bonus: Leave a comment on the MR when an operation is performed (for example when failed jobs are retried or a new pipeline is started) -
Notify the component team when limits are reached and release-tools will not retry anymore.
-
-
Remove feature flag
Edited by Reuben Pereira