Handle GKE Upgrade to 1.19
As part of #1137 (closed) we have GKE auto upgrades enabled on all of our non-production and some of our production clusters.
As discovered today, our non-production and production clusters have auto upgraded to Kubernetes 1.19.
Update 2021-05-21: only gprd gprd-us-east1-b
cluster is pending upgrade to 1.19 now.
$ gcloud --project gitlab-staging-1 container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
gstg-gitlab-gke us-east1 1.19.9-gke.1400 34.73.144.43 n1-standard-4 1.18.17-gke.100 * 26 RUNNING
gstg-us-east1-b us-east1-b 1.19.9-gke.1400 34.74.13.203 custom-16-20480 1.18.17-gke.100 * 9 RUNNING
gstg-us-east1-c us-east1-c 1.19.9-gke.1400 35.237.127.243 custom-16-20480 1.18.17-gke.100 * 9 RUNNING
gstg-us-east1-d us-east1-d 1.19.9-gke.1400 35.229.107.91 n1-standard-4 1.18.17-gke.100 * 9 RUNNING
$ gcloud --project gitlab-pre container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
pre-gitlab-gke us-east1 1.19.9-gke.1400 104.196.63.202 n1-standard-4 1.18.17-gke.100 * 13 RECONCILING
$ gcloud --project gitlab-production container clusters list
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
gprd-gitlab-gke us-east1 1.19.9-gke.1400 35.243.230.38 c2-standard-4 1.18.16-gke.502 * 106 RUNNING
gprd-us-east1-b us-east1-b 1.18.17-gke.100 35.185.25.234 custom-16-20480 1.18.12-gke.1206 * 86 RUNNING
gprd-us-east1-c us-east1-c 1.19.9-gke.1400 34.75.253.130 custom-16-20480 1.18.12-gke.1206 * 85 RUNNING
gprd-us-east1-d us-east1-d 1.19.9-gke.1400 34.73.149.139 custom-16-20480 1.18.16-gke.2100 * 83 RUNNING
The good news is this hasn't seemed to cause any issues at all, and wasn't even detected until I noticed some strange changes in one of our k8s-workloads/gitlab-com
pipelines due to our chart gracefully supporting the change in the ingress spec
gitlab-com/gl-infra/k8s-workloads/gitlab-com!860 (merged)
Things we need to do
-
Review the 1.19 Kubernetes upgrade notes for anything which we need to be aware of and/or troubleshoot/workaround/configure -
Do MRs to upgrade our tooling repos to use kubectl
for 1.19 and anything else we need to consider with regards to tooling versions -
Reach out to Google and see if we can figure out an appropriate terraform configuration to stop auto upgrades over Kubernetes minor versions altogether. I suspect this isn't possible -
Review gitlab-com/gitlab-helmfiles
andgitlab-com/tanka-deployments
to see if they use the old version of the ingress spec. If so, open issues to get them changed. -
GKE clusters gprd-us-east1-b
andgprd-us-east1-c
don't have auto upgrades enabled, so will sit on 1.18. In order to avoid us running multiple k8s versions for too long, lets enable auto-upgrades on them. This will also close out #1137 (closed) -
Get some automation/alerting in place for the brand new UpgradeAvailableEvent so that we are made aware when new versions are available. Ideally this would give us a heads up when we are about to be upgraded over a minor version, however it looks like we don't get any specific alert when a version becomes default and we are upgraded. At any rate, as soon as we see a new minor version available in our channel via an UpgradeEvent, we should plan for the upgrade anyway (which gives us 2-4 weeks). See note below
When a new version becomes available on a release channel, GKE sends an UpgradeAvailableEvent notification to clusters on that release channel to inform the clusters that a new version is now available. This notification provides one week of advance notice for patch versions and at least 2-4 weeks for minor versions (depending on the channel). For more information, see What versions are available in a channel.
Note: GKE does not send a notification when the release channel available version becomes default.
Edited by Graeme Gillies