Scheduled GitLab Runner installation test workflow
Summary
The initial issue that raised the incident was that a subset of tags were missing for the helper images. This caused CI/CD pipeline failures across GitLab repositories, and anyone trying to pull registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v17.8.0 which is an image used by GitLab Runner 17.8.0 to run every job.
We do not currently have a way to detect when a bad release breaks GitLab Runner installation. This remediation action adds a detection mechanism by scheduling a production installation test. Failure of the test workflow will result in an alert for the relevant oncall.
Related Incident(s)
- Incident: 2025-01-16: GitLab-runner image v17.8.0 not found (gitlab-com/gl-infra/production#19129 - closed) • Sarah Walker
- Review: Incident Review: GitLab Runner Helper v17.8 Ima... (gitlab-com/gl-infra/production#19131 - closed) • Arran Walker, Romuald Atchadé - OOO until Jan 11th, 2026
Desired Outcome/Acceptance Criteria
- Once every 8 hours a scheduled workflow runs a fresh GitLab Runner installation for each distribution method (
deb,rpmand image) using the current production sources. - A failure of the scheduled installation test will alert the relevant oncall.
Associated Services
ServiceCI Runners in GitLab.com / GitLab Infrastructure Team / Production Engineering
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose from -
Give context for what problem this corrective action is trying to prevent re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Production Engineering::P4' but should match the severity of the related incident) -
Assign a service label -
Assign a team label
Edited by Joe Burnett