version.gitlab.com certificate must be renewed before it is expires on Sat Mar 5 at 12:00UTC
Related to reliability-sav#4.
The certificate on version.gitlab.com expires on Sat, 05 Mar 2022 12:00 UTC. If it expires, we will no longer get service ping data which will be an S1 incident.
There currently isn't a clear plan on who has the knowledge and the access to renew the certificate.
Ref: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/4#note_857860893
Summary
What happened?
The cluster got auto-upgraded to Kubernetes version 1.20 which broke the cert-manager installation.
What needs to be done?
Cert-manager needs to be upgraded through a breaking change (from v0.9.x to
v1.7.x). This means it needs to be
- backed up
- fully removed
- re-installed
- restored
But since the installation was last touched with Helm v2, and Helm v2 is very old and does not (officially) support K8s 1.20, we will migrate the release metadata to Helm v3 first.
Plan
Step 1: Migrate to Helm v3
Helm v2 is very old and does not support k8s 1.20. For the best outcome, we should migrate to Helm v3 first.
# Ensure that `helm` points to V3
# VERIFY THE OUTPUT
helm version
# Install Helm v2 in PATH. This is on cloud shell, so we install under $HOME so it persists across sessions
https://get.helm.sh/helm-v2.17.0-linux-amd64.tar.gz
tar xf helm-v2.17.0-linux-amd64.tar.gz
mv linux-amd64/helm ~/bin/helm2
mv linux-amd64/tiller ~/bin/tiller
# Install helm-2to3 plugin. This also gets installed under $HOME
helm plugin install https://github.com/helm/helm-2to3.git
# Init local tiller
export KUBE_NAMESPACE=gitlab-managed-apps
export TILLER_NAMESPACE=$KUBE_NAMESPACE
export HELM_HOST="localhost:44134"
tiller -listen "$HELM_HOST" &
helm2 init --client-only
# Grab releases to be migrated
releases=$(helm2 ls --output json | jq -r '.Releases[].Name')
# Adopt all resources with annotations and labels in case they are not part of the persisted release data
for release in $releases; do
chart=$(helm2 ls "^$release\$" --output json | jq -r '.Releases[0].Chart')
echo "Adopting Helm v2 manifests from $release"
# some resource kinds must be listed explicitly https://github.com/kubernetes/kubernetes/issues/42885
for name in $(kubectl -n "$KUBE_NAMESPACE" get all,ingress,daemonset -o name -l chart="$chart"); do
kubectl annotate -n "$KUBE_NAMESPACE" --overwrite "$name" meta.helm.sh/release-name="$release"
kubectl annotate -n "$KUBE_NAMESPACE" --overwrite "$name" meta.helm.sh/release-namespace="$KUBE_NAMESPACE"
kubectl label -n "$KUBE_NAMESPACE" --overwrite "$name" app.kubernetes.io/managed-by=Helm
done
done
# Migrate each release
for release in $releases; do
echo "Migrating release: $release"
helm 2to3 convert --ignore-already-migrated --release-storage configmaps --tiller-out-cluster --tiller-ns "$TILLER_NAMESPACE" "$release"
done
# Kill Tiller so we don't acidentally use Helm 2 during the next steps
killall tiller
Step 2: Fix cert-manager
-
Back-up as per https://cert-manager.io/docs/installation/upgrading/upgrading-0.10-0.11/. If all goes well, we are not going to use this backup, but we keep it around for reference just in case.
-
Prepare an updated cluster-issuer based on the old issuer:
# issuer.yaml apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: email: dsylva@gitlab.com solvers: - http01: ingress: class: nginx privateKeySecretRef: name: letsencrypt-prod server: https://acme-v02.api.letsencrypt.org/directory -
Uninstall cert-manager
# review the resources to delete kubectl get Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all-namespaces # if all good, delete them kubectl delete Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all --all-namespaces # Uninstall certmanager. This is using Helm v3, so there is no --purge helm -n gitlab-managed-apps uninstall certmanager # Remove the legacy CRDs kubectl delete -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.9/deploy/manifests/00-crds.yaml -
Re-install certmanager, and immediately apply the issuer as a follow up.
helm repo add jetstack https://charts.jetstack.io helm repo update helm upgrade --install \ certmanager jetstack/cert-manager \ --namespace gitlab-managed-apps \ --version v1.7.1 \ --set installCRDs=true \ --set ingressShim.defaultIssuerKind=ClusterIssuer \ --set ingressShim.defaultIssuerName=letsencrypt-prod kubectl apply -f issuer.yaml
Step 3: Cleanup
If everything went well, then we can relatively safely remove the Helm v2 release data.
# Init local tiller
export KUBE_NAMESPACE=gitlab-managed-apps
export TILLER_NAMESPACE=$KUBE_NAMESPACE
export HELM_HOST="localhost:44134"
tiller -listen "$HELM_HOST" &
helm2 init --client-only
# Delete Helm 2 release data
helm 2to3 cleanup --skip-confirmation --release-storage configmaps --tiller-out-cluster --tiller-ns "$TILLER_NAMESPACE"
# Kill local Tiller
killall tiller