Upgrade all GKE clusters to 1.18

We need to upgrade all our GKE clusters to Kubernetes 1.18 (minimum release v1.18.6-gke.6300) as it was highlighted at gitlab-org/charts/gitlab#2440 (comment 486117933) that we have incorrect sysctl settings on our node, potentially leading to issues.

Looking at the upgrades notes at https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#urgent-upgrade-notes I'll highlight the important things of note that I think affect us

kube-apiserver:
the following deprecated APIs can no longer be served:
All resources under apps/v1beta1 and apps/v1beta2 - use apps/v1 instead
daemonsets, deployments, replicasets resources under extensions/v1beta1 - use apps/v1 instead
networkpolicies resources under extensions/v1beta1 - use networking.k8s.io/v1 instead
podsecuritypolicies resources under extensions/v1beta1 - use policy/v1beta1 instead (#85903, @liggitt) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Testing]

We need to audit everything in the Gitlab chart (and all other services we deploy) to make sure we aren't using any deprecated interfaces

resource metrics endpoint /metrics/resource/v1alpha1 as well as all metrics under this endpoint have been deprecated. Please convert to the following metrics emitted by endpoint /metrics/resource:

  • scrape_error --> scrape_error
  • node_cpu_usage_seconds_total --> node_cpu_usage_seconds
  • node_memory_working_set_bytes --> node_memory_working_set_bytes
  • container_cpu_usage_seconds_total --> container_cpu_usage_seconds
  • container_memory_working_set_bytes --> container_memory_working_set_bytes
  • scrape_error --> scrape_error
    (#86282, @RainbowMango) [SIG Node]

We need to confirm we don't rely on any of these

Ingress:
spec.ingressClassName replaces the deprecated kubernetes.io/ingress.class annotation, and allows associating an Ingress object with a particular controller.
path definitions added a pathType field to allow indicating how the specified path should be matched against incoming requests. Valid values are Exact, Prefix, and ImplementationSpecific (#88587, @cmluciano) [SIG Apps, Cluster Lifecycle and Network]

We should check all ingress objects to make sure this is ok. I think this should be fine for backwards compatibility, but a bit of confirmation and investigation is worthwhile.

Metrics changes also documented at https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#metrics which we should review

Checklist

Pre-upgrade checks for 1.18

Upgrade

  • ops master(s) upgraded
  • ops nodes upgraded
  • pre master(s) upgraded
  • pre nodes upgraded
  • gstg master(s) upgraded
  • gstg nodes upgraded
  • gprd master(s) upgraded
  • gprd nodes upgraded
  • org-ci master(s) upgraded
  • org-ci nodes upgraded
Edited by Graeme Gillies