Update the K8s documentation to capture lessons learnt from the 1.16 upgrade
Update https://ops.gitlab.net/gitlab-com/runbooks/-/blob/master/docs/uncategorized/k8s-cluster-upgrade.md
to make sure:
-
we need to scour the entire Kubernetes changelog, not just the section "Urgent Upgrade Instructions" https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.16.md#urgent-upgrade-notes-1 before upgrades to make sure we won't break anything (see https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1116#note_396672512 for context)
-
Make sure we check every dashboard for Kubernetes that nothing is broken in
gstg
upgrades. -
Document a recommended approach for upgrading quickly as well as safely (full context in https://gitlab.com/gitlab-com/gl-infra/delivery/-/issues/1116#note_396672512) but we should either upgrade the masters in 1 step, then do all node pools at once (instead of one at a time) to drastically improve the time. and/or, instead of in place upgrades, parallel creation of entire new node pools, and then a bulk migration over, is likely the most time efficient method of all.