GitLab Runner v13.10.0-rc1 deploy

Production Change

Change Summary

Deploy GitLab Runner v13.10.0-rc1 to prmX, gsrmX, gdsrmX, srmX, part of the release process. Changelog can be found in here and a total of 78 commits

Change Details

  1. Services Impacted - ServiceCI Runners
  2. Change Technician - @steveazz
  3. Change Criticality - C4
  4. Change Type - changescheduled
  5. Change Reviewer - @ggeorgiev_gitlab
  6. Due Date - 2020-03-10 0530
  7. Time tracking - 16 hours (including rollback)
  8. Downtime Component - No downtime

Detailed steps for the change

Pre-Change Steps - steps to be completed before execution of the change

Estimated Time to Complete (mins) - 30min

Change Steps - steps to take to execute the change

Estimated Time to Complete (mins) - 8 hours

  • On 2021-03-10 0530 start executing deploy v13.10.0-rc1 from gitlab-org/gitlab-runner#27638 (closed) to upgrade prmX
    • at RC1 release day: to prmX runners

      • make sure it's not inside of the PCL time window.

      • go to your local chef-repo working directory and execute:

        knife ssh -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh stop_chef'
        knife ssh -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i systemctl is-active chef-client'
        git checkout master && git pull
        git checkout -b update-prm-runners-to-13-10-0-rc1
      • update version

        $EDITOR roles/gitlab-runner-prm.json

        In the role definition prepare the override_attributes entry. It should be placed at the top of the file:

        "override_attributes": {
          "cookbook-gitlab-runner": {
            "gitlab-runner": {
                "repository": "unstable",
                "version": "13.10.0-rc1"
            }
          }
        },
      • git add roles/gitlab-runner-prm.json && git commit -m "Update prmX runners to v13.10.0-rc1"

      • git push -u origin update-prm-runners-to-13-10-0-rc1

      • after pushing the branch, create and manage to merge the chef-repo MR

      • check the production_dry_run job if it tries to update only the changed role

      • start the manual apply to prod job

      • after the job is finished execute:

        knife ssh -C 1 -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh' &
        time wait
  • On 2021-03-15 start executing deploy v13.10.0-rc1 from gitlab-org/gitlab-runner#27638 (closed) to upgrade gdsrmX, gsrmX
    • make sure it's not inside of the PCL time window.

    • go to your local chef-repo working directory and execute:

      knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner' -- 'sudo -i /root/runner_upgrade.sh stop_chef'
      knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner' -- 'sudo -i systemctl is-active chef-client'
      git checkout master && git pull
      git checkout -b update-runners-to-13-10-0-rc1
    • update version for gsrm/srm

      $EDITOR roles/gitlab-runner-gsrm.json

      In the role definition prepare the gitlab-runner entry:

          "override_attributes": {
            "cookbook-gitlab-runner": {
              "gitlab-runner": {
                  "repository": "unstable",
                  "version": "13.10.0-rc1"
              }
            }
          },
    • update version for org-ci

      $EDITOR roles/org-ci-base-runner.json

      In the role definition prepare the gitlab-runner entry:

      "cookbook-gitlab-runner": {
        "gitlab-runner": {
          "repository": "unstable",
          "version": "13.10.0-rc1"
         }
      }
    • git add roles/gitlab-runner-gsrm.json roles/org-ci-base-runner.json && git commit -m "Update runners to v13.10.0-rc1"

    • git push -u origin update-runners-to-13-10-0-rc1

    • after pushing the branch, create and manage to merge the chef-repo MR

    • check the production_dry_run job if it tries to update only the changed role

    • start the manual apply to prod job

    • after the job is finished execute (we're not touching prmX - they are already updated):

      knife ssh -C1 -afqdn 'roles:gitlab-runner-gsrm' -- 'sudo -i /root/runner_upgrade.sh' &
      knife ssh -C1 -afqdn 'roles:org-ci-base-runner' -- 'sudo  -i /root/runner_upgrade.sh' &
      time wait

Rollback

Rollback steps - steps to be taken in the event of a need to rollback this change

Estimated Time to Complete (mins) - 8 hours

  • make sure it's not inside of the PCL time window.

  • go to your local chef-repo working directory and execute:

    knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner OR roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh stop_chef'
    knife ssh -afqdn 'roles:gitlab-runner-gsrm OR roles:org-ci-base-runner OR roles:gitlab-runner-prm' -- 'sudo -i systemctl is-active chef-client'
    git checkout master && git pull
    git checkout -b rollback-update-runners-to-13-10-0-rc1
  • update version for gsrm

    $EDITOR roles/gitlab-runner-gsrm.json

    In the role definition prepare the gitlab-runner entry:

    "override_attributes": {
      "cookbook-gitlab-runner": {
        "gitlab-runner": {
          "repository": "unstable",
          "version": "13.9.0-rc1"
        }
      }
    }
  • update version for prm

    $EDITOR roles/gitlab-runner-prm.json

    In the role definition prepare the gitlab-runner entry:

    "override_attributes": {
      "cookbook-gitlab-runner": {
        "gitlab-runner": {
          "repository": "unstable",
          "version": "13.9.0-rc1"
        }
      }
    }
  • update version for org-ci

    $EDITOR roles/org-ci-base-runner.json

    In the role definition prepare the gitlab-runner entry:

    "cookbook-gitlab-runner": {
      "gitlab-runner": {
        "repository": "unstable",
        "version": "13.9.0-rc1"
       }
    }
  • git add roles/gitlab-runner-prm.json roles/gitlab-runner-gsrm.json roles/org-ci-base-runner.json && git commit -m "Update runners to v13.10.0-rc1"

  • git push -u origin rollback-update-runners-to-13-10-0-rc1

  • after pushing the branch, create and manage to merge the chef-repo MR

  • check the production_dry_run job if it tries to update only the changed role

  • start the manual apply to prod job

  • after the job is finished execute (we're not touching prmX - they are already updated):

    knife ssh -C1 -afqdn 'roles:gitlab-runner-prm' -- 'sudo -i /root/runner_upgrade.sh' &
    knife ssh -C1 -afqdn 'roles:gitlab-runner-gsrm' -- 'sudo -i /root/runner_upgrade.sh' &
    knife ssh -C1 -afqdn 'roles:org-ci-base-runner' -- 'sudo  -i /root/runner_upgrade.sh' &
    time wait

Monitoring

Key metrics to observe

Summary of infrastructure changes

  • [ ] Does this change introduce new compute instances?
  • [ ] Does this change re-size any existing compute instances?
  • [ ] Does this change introduce any additional usage of tooling like Elastic Search, CDNs, Cloudflare, etc?

Summary of the above

Changes checklist

  • This issue has a criticality label (e.g. C1, C2, C3, C4) and a change-type label (e.g. changeunscheduled, changescheduled) based on the Change Management Criticalities.
  • This issue has the change technician as the assignee.
  • Pre-Change, Change, Post-Change, and Rollback steps and have been filled out and reviewed.
  • Necessary approvals have been completed based on the Change Management Workflow.
  • Change has been tested in staging and results noted in a comment on this issue.
  • A dry-run has been conducted and results noted in a comment on this issue.
  • SRE on-call has been informed prior to change being rolled out. (In #production channel, mention @sre-oncall and this issue and await their acknowledgement.)
  • There are currently no active incidents.