Retry component update (Gitaly, KAS) MR pipelines
What does this MR do and why?
Describe in detail what your merge request does and why.
Content
Add pipeline retry functionality for component update merge requests (Gitaly and KAS). This change is behind a feature flag called retry_pipeline_service. I'll create the FF after the MR has been merged and I'm ready to test.
This MR automates what RMs have been doing. It checks the latest pipeline in a Gitaly or KAS MR, and retries any failing jobs, or retries the pipeline.
-
Implements
RetryMrPipelineServiceto automatically retry failed jobs or create new MR pipelines This class:- Returns without doing anything if
retry_pipeline_servicefeature flag is not turned on - Returns without doing anything if MR is not open
- Returns without doing anything if latest MR pipeline has no failed jobs
- Retries pipeline if latest pipeline is older than 8 hours (GitLab Rails MRs cannot be merged with a pipeline older than 8 hours)
- Retries pipeline if jobs have already been retried twice (so they have run 3 times)
- Retries failed jobs if none of the above applied
- Returns without doing anything if
-
Create
PipelineFailureServiceto detect and analyze pipeline failures This class providesRetryMrPipelineServicewith information like failed jobs, number of retries, etc. -
Refactor
Components::Updaterto use the new retry service- When a component update MR already exists, the
Updaterclass will now callRetryMrPipelineServiceto retry the pipeline if required. - If
RetryMrPipelineServiceretries the pipeline/jobs,Updaterwill exit since there is no point in attempting to set MWPS while CI is still running. - If
RetryMrPipelineServicedid nothing,Updaterwill continue execution and attempt to set MWPS on the MR.
- When a component update MR already exists, the
On a related note, Siddharth has opened an MR to notify component teams when an MR is in an unmergeable state (for example due to merge conflicts): !4110 (merged).
Demo of this MR's functionality: https://youtu.be/HTZOZOIk-so?t=37
gitlab-com/gl-infra/delivery#20867
Tests
Test run (TEST=true) on a KAS MR
➜ release-tools git:(rp/retry-failed-pipelines) ✗ TEST=true op run --env-file=".env.read" -- bundle exec rake 'components:update_kas'
2025-05-05 16:59:56.936245 I [dry-run] ReleaseTools::Services::UpdateComponentService -- Finding the current component version -- {component: "gitlab-agent", branch: "master"}
2025-05-05 16:59:57.532999 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 16:59:57 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab/repository/files/GITLAB_KAS_VERSION/raw" 41
2025-05-05 16:59:59.285544 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 16:59:59 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fsecurity%2Fcluster-integration%2Fgitlab-agent/repository/commits" -
2025-05-05 17:00:00.584509 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:00 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fsecurity%2Fcluster-integration%2Fgitlab-agent/repository/commits/6276aded2a69e5b41771a8955983fcbca1a323d2" -
2025-05-05 17:00:02.241944 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:02 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab/merge_requests" -
2025-05-05 17:00:02.242269 I [dry-run] ReleaseTools::Tasks::Components::UpdateKas -- Found existing merge request -- {merge_request: "https://gitlab.com/gitlab-org/gitlab/-/merge_requests/190166", mwps: false, merge_status: "ci_must_pass"}
2025-05-05 17:00:03.147258 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:03 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines" -
2025-05-05 17:00:03.759921 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:03 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/jobs" -
2025-05-05 17:00:04.618101 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:04 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/bridges" 2
2025-05-05 17:00:06.687129 D [dry-run] ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:00:06 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/jobs" -
2025-05-05 17:00:06.688885 I [dry-run] ReleaseTools::Services::RetryMrPipelineService -- Retrying failed job -- {failed_job: "https://gitlab.com/gitlab-org/gitlab/-/jobs/9927711094"}
2025-05-05 17:00:06.688906 I [dry-run] ReleaseTools::Tasks::Components::UpdateKas -- The MR pipeline or job(s) were retried
Actual run (without TEST=true) on the same KAS MR
➜ release-tools git:(rp/retry-failed-pipelines) ✗ op run --env-file=".env.write" -- bundle exec rake 'components:update_kas'
2025-05-05 17:01:54.110084 I ReleaseTools::Services::UpdateComponentService -- Finding the current component version -- {component: "gitlab-agent", branch: "master"}
2025-05-05 17:01:54.762484 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:54 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab/repository/files/GITLAB_KAS_VERSION/raw" 41
2025-05-05 17:01:55.611271 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:55 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fsecurity%2Fcluster-integration%2Fgitlab-agent/repository/commits" -
2025-05-05 17:01:56.195403 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:56 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fsecurity%2Fcluster-integration%2Fgitlab-agent/repository/commits/6276aded2a69e5b41771a8955983fcbca1a323d2" -
2025-05-05 17:01:56.900448 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:56 +0530] 200 "GET https://gitlab.com/api/v4/projects/gitlab-org%2Fgitlab/merge_requests" -
2025-05-05 17:01:56.900891 I ReleaseTools::Tasks::Components::UpdateKas -- Found existing merge request -- {merge_request: "https://gitlab.com/gitlab-org/gitlab/-/merge_requests/190166", mwps: false, merge_status: "ci_must_pass"}
2025-05-05 17:01:57.340283 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:57 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines" -
2025-05-05 17:01:58.143424 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:58 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/jobs" -
2025-05-05 17:01:58.633581 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:58 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/bridges" 2
2025-05-05 17:01:59.631605 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:01:59 +0530] 200 "GET https://gitlab.com/api/v4/projects/278964/pipelines/1800398399/jobs" -
2025-05-05 17:02:01.622762 D ReleaseTools::GitlabClient -- [HTTParty] [2025-05-05 17:02:01 +0530] 201 "POST https://gitlab.com/api/v4/projects/278964/jobs/9927711094/retry" 3115
2025-05-05 17:02:01.623038 I ReleaseTools::Services::RetryMrPipelineService -- Retried job -- {new_job: "https://gitlab.com/gitlab-org/gitlab/-/jobs/9929016910"}
2025-05-05 17:02:01.623053 I ReleaseTools::Tasks::Components::UpdateKas -- The MR pipeline or job(s) were retried
Author Check-list
-
Has documentation been updated?