version.gitlab.com certificate must be renewed before it is expires on Sat Mar 5 at 12:00UTC
Related to reliability-sav#4.
The certificate on version.gitlab.com expires on Sat, 05 Mar 2022 12:00 UTC. If it expires, we will no longer get service ping data which will be an S1 incident.
There currently isn't a clear plan on who has the knowledge and the access to renew the certificate.
Ref: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/4#note_857860893
## Summary
### What happened?
The cluster got auto-upgraded to Kubernetes version 1.20 which broke the cert-manager installation.
### What needs to be done?
Cert-manager needs to be upgraded through a breaking change (from `v0.9.x` to
`v1.7.x`). This means it needs to be
1. backed up
1. fully removed
1. re-installed
1. restored
But since the installation was last touched with Helm v2, and
Helm v2 is very old and does not (officially) support K8s 1.20,
we will migrate the release metadata to Helm v3 first.
## Plan
### Step 1: Migrate to Helm v3
Helm v2 is very old and [does not support k8s 1.20](https://helm.sh/docs/topics/version_skew/#supported-version-skew).
For the best outcome, we should migrate to Helm v3 first.
```shell
# Ensure that `helm` points to V3
# VERIFY THE OUTPUT
helm version
# Install Helm v2 in PATH. This is on cloud shell, so we install under $HOME so it persists across sessions
https://get.helm.sh/helm-v2.17.0-linux-amd64.tar.gz
tar xf helm-v2.17.0-linux-amd64.tar.gz
mv linux-amd64/helm ~/bin/helm2
mv linux-amd64/tiller ~/bin/tiller
# Install helm-2to3 plugin. This also gets installed under $HOME
helm plugin install https://github.com/helm/helm-2to3.git
# Init local tiller
export KUBE_NAMESPACE=gitlab-managed-apps
export TILLER_NAMESPACE=$KUBE_NAMESPACE
export HELM_HOST="localhost:44134"
tiller -listen "$HELM_HOST" &
helm2 init --client-only
# Grab releases to be migrated
releases=$(helm2 ls --output json | jq -r '.Releases[].Name')
# Adopt all resources with annotations and labels in case they are not part of the persisted release data
for release in $releases; do
chart=$(helm2 ls "^$release\$" --output json | jq -r '.Releases[0].Chart')
echo "Adopting Helm v2 manifests from $release"
# some resource kinds must be listed explicitly https://github.com/kubernetes/kubernetes/issues/42885
for name in $(kubectl -n "$KUBE_NAMESPACE" get all,ingress,daemonset -o name -l chart="$chart"); do
kubectl annotate -n "$KUBE_NAMESPACE" --overwrite "$name" meta.helm.sh/release-name="$release"
kubectl annotate -n "$KUBE_NAMESPACE" --overwrite "$name" meta.helm.sh/release-namespace="$KUBE_NAMESPACE"
kubectl label -n "$KUBE_NAMESPACE" --overwrite "$name" app.kubernetes.io/managed-by=Helm
done
done
# Migrate each release
for release in $releases; do
echo "Migrating release: $release"
helm 2to3 convert --ignore-already-migrated --release-storage configmaps --tiller-out-cluster --tiller-ns "$TILLER_NAMESPACE" "$release"
done
# Kill Tiller so we don't acidentally use Helm 2 during the next steps
killall tiller
```
### Step 2: Fix cert-manager
1. Back-up as per https://cert-manager.io/docs/installation/upgrading/upgrading-0.10-0.11/.
If all goes well, we are **not going to use this backup**, but we keep it around for reference just in case.
1. Prepare an updated cluster-issuer based on the old issuer:
```yaml
# issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: dsylva@gitlab.com
solvers:
- http01:
ingress:
class: nginx
privateKeySecretRef:
name: letsencrypt-prod
server: https://acme-v02.api.letsencrypt.org/directory
```
1. Uninstall cert-manager
```bash
# review the resources to delete
kubectl get Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all-namespaces
# if all good, delete them
kubectl delete Issuers,ClusterIssuers,Certificates,CertificateRequests,Orders,Challenges --all --all-namespaces
# Uninstall certmanager. This is using Helm v3, so there is no --purge
helm -n gitlab-managed-apps uninstall certmanager
# Remove the legacy CRDs
kubectl delete -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.9/deploy/manifests/00-crds.yaml
```
1. Re-install certmanager, and immediately apply the issuer as a follow up.
```bash
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install \
certmanager jetstack/cert-manager \
--namespace gitlab-managed-apps \
--version v1.7.1 \
--set installCRDs=true \
--set ingressShim.defaultIssuerKind=ClusterIssuer \
--set ingressShim.defaultIssuerName=letsencrypt-prod
kubectl apply -f issuer.yaml
```
### Step 3: Cleanup
If everything went well, then we can relatively safely remove the Helm v2 release data.
```bash
# Init local tiller
export KUBE_NAMESPACE=gitlab-managed-apps
export TILLER_NAMESPACE=$KUBE_NAMESPACE
export HELM_HOST="localhost:44134"
tiller -listen "$HELM_HOST" &
helm2 init --client-only
# Delete Helm 2 release data
helm 2to3 cleanup --skip-confirmation --release-storage configmaps --tiller-out-cluster --tiller-ns "$TILLER_NAMESPACE"
# Kill local Tiller
killall tiller
```
issue