WIP: Upgrade To Helm 3
C4
Production Change - Criticality 4Change Objective | Describe the objective of the change |
---|---|
Change Type | ConfigurationChange|Operation |
Services Impacted | kubernetes |
Change Team Members | @skarbek |
Change Criticality | C4 |
Change Reviewer or tested in staging | @jarv |
Dry-run output | Changes are completed by scripts which have dryrun capabilities by default and validation steps during finalization |
Due Date | Date and time in UTC timezone for the execution of the change, if possible add the local timezone of the engineer executing the change |
Time tracking | 1hr |
Summary
Upgrade our use of Helm version 2, to Helm version 3 in our tooling utilized for our Kubernetes components. A breakdown and discussion of the below procedure is located here: delivery#670 (comment 321392386)
Detailed steps for the change
-
Backup helm version 2 release objects -
Copy the backup script: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/common/snippets/15 onto the console server - Staging:
console-01-sv-gstg.c.gitlab-staging-1.internal
- Production:
console-01-sv-gprd.c.gitlab-production.internal
- Staging:
-
Backup Staging: DRY_RUN=false ./backup-k8s.sh
-
Backup Production: DRY_RUN=false ./backup-k8s.sh
-
Validate backup data existence in Staging -
Validate backup data existence in Production - Data will be in the form of Kubernetes objects in yaml format located in a directory called
backup
from where the script was executed
- Data will be in the form of Kubernetes objects in yaml format located in a directory called
-
-
Perform the helm upgrade - Trigger a pipeline in project: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/common/pipelines
- Use variables:
UPGRADE_HELM_3: true
;ENVIRONMENT: <gstg|pre|gprd>
;DRY_RUN: false
- The scripts utilized in the CI job will perform the validation for us
- Use variables:
-
Upgrade complete in Staging -
Upgrade complete in Production
- Trigger a pipeline in project: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/common/pipelines
-
Perform a tooling upgrade - This is our first break point - as soon as we perform this, there will be a discrepancy between the latest version 2 and a version 3 release; we can still reverse course if necessary, only additional steps need to be taken, as noted below
- Merge the following MR's, and ensure pipelines run through to Production without issue:
-
gitlab-helmfiles
- TODO -
logging
- https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/logging/-/merge_requests/18 -
plantuml
- gitlab-com/gl-infra/k8s-workloads/plantuml!30 -
monitoring
- https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/monitoring/-/merge_requests/73 -
gitlab-com
- TODO
-
-
Clean up - This is our final break point - At this point, everything above is our validation, this step will remove the helm version 2 releases
- Trigger a pipeline in project: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/common/pipelines
- Use variables:
CLEANUP_HELM_2: true
;ENVIRONMENT: <gstg|pre|gprd>
;DRY_RUN: false
- The scripts utilized in the CI job will perform the validation for us
- Use variables:
-
Cleanup complete in Staging -
Cleanup complete in Production
-
Remove Backups - If we cleaned up, there's no need to keep these backups, simply delete the backup directory local to the backup script
-
Backups removed on Staging -
Backups removed on Production
Rollback steps
- If we've not crossed any breakpoints, simply delete any helm 3 configurations.
kubectl get secrets --all-namespaces | grep -e '^sh.helm.*' | awk {'print $1'} | xargs kubectl delete secret
- And you are done, do not proceed forward
- If we've gone beyond our first breakpoint noted above, then we must perform a
helm rollback
using version 3 of helm. This will align the version 3 converted release with our version 2 backups.- Install helm version 3.1.2 onto the console node, and then run
helm rollback <RELEASE_NAME>
- Restore the v2 kubernetes objects as necessary from our backups
kubectl create -f <FILENAME>
- Revert any changes performed on that projects' tooling, perform this by reverting the above MRs on each project
- Install helm version 3.1.2 onto the console node, and then run
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
SRE on-call has been informed prior to change being rolled out -
There are currently no open issues labeled as ServiceMonitoring with severities of ~S1 or ~S2
Edited by John Skarbek