GitLab Runner v13.10.0-rc1 deploy
Production Change
Change Summary
Deploy GitLab Runner v13.10.0-rc1 to prmX, gsrmX, gdsrmX, srmX, part of the release process. Changelog can be found in here and a total of 78 commits
Change Details
- Services Impacted - ServiceCI Runners
- Change Technician - @steveazz
- Change Criticality - C4
- Change Type - changescheduled
- Change Reviewer - @ggeorgiev_gitlab
- Due Date - 2020-03-10 0530
- Time tracking - 16 hours (including rollback)
- Downtime Component - No downtime
Detailed steps for the change
Pre-Change Steps - steps to be completed before execution of the change
Estimated Time to Complete (mins) - 30min
Change Steps - steps to take to execute the change
Estimated Time to Complete (mins) - 8 hours
-
On 2021-03-10 0530 start executing deploy v13.10.0-rc1from gitlab-org/gitlab-runner#27638 (closed) to upgradeprmX-
at RC1 release day: to prmXrunners-
make sure it's not inside of the PCL time window. -
go to your local chef-repoworking directory and execute:knife ssh -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh stop_chef' knife ssh -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i systemctl is-active chef-client' git checkout master && git pull git checkout -b update-prm-runners-to-13-10-0-rc1 -
update version $EDITOR roles/gitlab-runner-prm.jsonIn the role definition prepare the
override_attributesentry. It should be placed at the top of the file:"override_attributes": { "cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.10.0-rc1" } } }, -
git add roles/gitlab-runner-prm.json && git commit -m "Update prmX runners to v13.10.0-rc1" -
git push -u origin update-prm-runners-to-13-10-0-rc1 -
after pushing the branch, create and manage to merge the chef-repoMR -
check the production_dry_runjob if it tries to update only the changed role -
start the manual apply to prodjob -
after the job is finished execute: knife ssh -C 1 -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh' & time wait
-
-
-
On 2021-03-15 start executing deploy v13.10.0-rc1from gitlab-org/gitlab-runner#27638 (closed) to upgradegdsrmX,gsrmX-
make sure it's not inside of the PCL time window. -
go to your local chef-repoworking directory and execute:knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner' -- 'sudo -i /root/runner_upgrade.sh stop_chef' knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner' -- 'sudo -i systemctl is-active chef-client' git checkout master && git pull git checkout -b update-runners-to-13-10-0-rc1 -
update version for gsrm/srm $EDITOR roles/gitlab-runner-gsrm.jsonIn the role definition prepare the
gitlab-runnerentry:"override_attributes": { "cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.10.0-rc1" } } }, -
update version for org-ci $EDITOR roles/org-ci-base-runner.jsonIn the role definition prepare the
gitlab-runnerentry:"cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.10.0-rc1" } } -
git add roles/gitlab-runner-gsrm.json roles/org-ci-base-runner.json && git commit -m "Update runners to v13.10.0-rc1" -
git push -u origin update-runners-to-13-10-0-rc1 -
after pushing the branch, create and manage to merge the chef-repoMR -
check the production_dry_runjob if it tries to update only the changed role -
start the manual apply to prodjob -
after the job is finished execute (we're not touching prmX- they are already updated):knife ssh -C1 -afqdn 'roles:gitlab-runner-gsrm' -- 'sudo -i /root/runner_upgrade.sh' & knife ssh -C1 -afqdn 'roles:org-ci-base-runner' -- 'sudo -i /root/runner_upgrade.sh' & time wait
-
Rollback
Rollback steps - steps to be taken in the event of a need to rollback this change
Estimated Time to Complete (mins) - 8 hours
-
make sure it's not inside of the PCL time window. -
go to your local chef-repoworking directory and execute:knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner OR roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh stop_chef' knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner OR roles:gitlab-runner-prm' -- 'sudo -i systemctl is-active chef-client' git checkout master && git pull git checkout -b rollback-update-runners-to-13-10-0-rc1 -
update version for gsrm $EDITOR roles/gitlab-runner-gsrm.jsonIn the role definition prepare the
gitlab-runnerentry:"override_attributes": { "cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.9.0-rc1" } } } -
update version for prm $EDITOR roles/gitlab-runner-prm.jsonIn the role definition prepare the
gitlab-runnerentry:"override_attributes": { "cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.9.0-rc1" } } } -
update version for org-ci $EDITOR roles/org-ci-base-runner.jsonIn the role definition prepare the
gitlab-runnerentry:"cookbook-gitlab-runner": { "gitlab-runner": { "repository": "unstable", "version": "13.9.0-rc1" } } -
git add roles/gitlab-runner-prm.json roles/gitlab-runner-gsrm.json roles/org-ci-base-runner.json && git commit -m "Update runners to v13.10.0-rc1" -
git push -u origin rollback-update-runners-to-13-10-0-rc1 -
after pushing the branch, create and manage to merge the chef-repoMR -
check the production_dry_runjob if it tries to update only the changed role -
start the manual apply to prodjob -
after the job is finished execute (we're not touching prmX- they are already updated):knife ssh -C1 -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh' & knife ssh -C1 -afqdn 'roles:gitlab-runner-gsrm' -- 'sudo -i /root/runner_upgrade.sh' & knife ssh -C1 -afqdn 'roles:org-ci-base-runner' -- 'sudo -i /root/runner_upgrade.sh' & time wait
Monitoring
Key metrics to observe
- Metric: Apdex
- Location: https://dashboards.gitlab.net/d/ci-runners-main/ci-runners-overview?viewPanel=79474957&orgId=1&var-PROMETHEUS_DS=Global&var-environment=gprd&var-stage=main&var-sigma=2
- What changes to this metric should prompt a rollback: A drop in the apdex score
- Metric: Runner system failures
- Location: https://dashboards.gitlab.net/d/000000159/ci?viewPanel=82&orgId=1&var-shard=All&var-runner_type=All&var-runner_managers=All&var-gitlab_env=gprd&var-gl_monitor_fqdn=All&var-has_minutes=yes&var-runner_job_failure_reason=All&var-jobs_running_for_project=0&var-runner_request_endpoint_status=All
- What changes to this metric should prompt a rollback: Increase in
runner_system_failure
Summary of infrastructure changes
[ ] Does this change introduce new compute instances?[ ] Does this change re-size any existing compute instances?[ ] Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?
Summary of the above
Changes checklist
-
This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities. -
This issue has the change technician as the assignee. -
Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed. -
Necessary approvals have been completed based on the Change Management Workflow. -
Change has been tested in staging and results noted in a comment on this issue. -
A dry-run has been conducted and results noted in a comment on this issue. -
SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncalland this issue and await their acknowledgement.) -
There are currently no active incidents.