Upgrade all GKE clusters to 1.18
We need to upgrade all our GKE clusters to Kubernetes 1.18 (minimum release v1.18.6-gke.6300) as it was highlighted at gitlab-org/charts/gitlab#2440 (comment 486117933) that we have incorrect sysctl settings on our node, potentially leading to issues.
Looking at the upgrades notes at https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#urgent-upgrade-notes I'll highlight the important things of note that I think affect us
kube-apiserver:
the following deprecated APIs can no longer be served:
All resources under apps/v1beta1 and apps/v1beta2 - use apps/v1 instead
daemonsets, deployments, replicasets resources under extensions/v1beta1 - use apps/v1 instead
networkpolicies resources under extensions/v1beta1 - use networking.k8s.io/v1 instead
podsecuritypolicies resources under extensions/v1beta1 - use policy/v1beta1 instead (#85903, @liggitt) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Testing]
We need to audit everything in the Gitlab chart (and all other services we deploy) to make sure we aren't using any deprecated interfaces
resource metrics endpoint /metrics/resource/v1alpha1 as well as all metrics under this endpoint have been deprecated. Please convert to the following metrics emitted by endpoint /metrics/resource:
- scrape_error --> scrape_error
- node_cpu_usage_seconds_total --> node_cpu_usage_seconds
- node_memory_working_set_bytes --> node_memory_working_set_bytes
- container_cpu_usage_seconds_total --> container_cpu_usage_seconds
- container_memory_working_set_bytes --> container_memory_working_set_bytes
- scrape_error --> scrape_error
(#86282, @RainbowMango) [SIG Node]
We need to confirm we don't rely on any of these
Ingress:
spec.ingressClassName replaces the deprecated kubernetes.io/ingress.class annotation, and allows associating an Ingress object with a particular controller.
path definitions added a pathType field to allow indicating how the specified path should be matched against incoming requests. Valid values are Exact, Prefix, and ImplementationSpecific (#88587, @cmluciano) [SIG Apps, Cluster Lifecycle and Network]
We should check all ingress objects to make sure this is ok. I think this should be fine for backwards compatibility, but a bit of confirmation and investigation is worthwhile.
Metrics changes also documented at https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#metrics which we should review
Checklist
Pre-upgrade checks for 1.18
-
Upgrade kubectl client for CI - https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12367#note_488814977 -
Confirm all resources in k8s-workloads/gitlab-comuse non-deprecated apis -
Confirm all resources in k8s-workloads/gitlab-helmfilesuse non-deprecated apis -
Confirm all resources in k8s-workloads/tanka-deploymentsuse non-deprecated apis -
Confirm that we do not use resource metrics endpoint /metrics/resource/v1alpha1and if so, migrate to new endpoint -
Confirm that all ingress resources do not rely on the kubernetes.io/ingress.classannotation -
Confirm that the metric changes documented at https://v1-18.docs.kubernetes.io/docs/setup/release/notes/#metrics do not affect us
Upgrade
-
opsmaster(s) upgraded -
opsnodes upgraded -
premaster(s) upgraded -
prenodes upgraded -
gstgmaster(s) upgraded -
gstgnodes upgraded -
gprdmaster(s) upgraded -
gprdnodes upgraded -
org-cimaster(s) upgraded -
org-cinodes upgraded