Certificate wasn't refreshed in GKE / EKS due to ingress controller changes
Summary
Cert-manager was trying correctly to refresh the cert but in it's standard HTTP check it was always getting redirected to GitLab Webservice instead of the acme web servers, leading to constant failures.
This workaround (removing kubernetes.io/ingress.class and adding spec.ingressClassName) around ingress class annotations fixed the issue and the cert has now been issued.
Additional context
From Slack discussion:
There have been recent changes to the GCE ingress controller behaves, and only recently has that actually accepted ingressClassName in place of the annotation. https://cloud.google.com/kubernetes-engine/docs/concepts/ingress
related to Update example values for Google and Amazon Ing... (!3263 - merged)
a newer version of Cert Manager was released with support for the new field - https://github.com/cert-manager/cert-manager/releases/tag/v1.12.0
Steps to reproduce
- Install Charts with SSL
- Trigger cert renewal
More details in https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102#note_1467057771
Workarounds
In each cn-acme-http-solver Ingress config delete the kubernetes.io/ingress.class annotation and then set the same config of gitlab-nginx under spec.ingressClassName:
-
Delete
kubernetes.io/ingress.classundermetadata.annotations:metadata: annotations: kubernetes.io/ingress.class: gitlab-nginx -
Set
spec.ingressClassNamewith the same value:spec: ingressClassName: gitlab-nginx -
Save the config and wait a minute or so. The Ingress should then disappear meaning the cert renewal was successful
-
If no Ingress is present - Delays between certificate renewal retries by
cert-managerwill occur if many failed attempts were occurred. If certificate is expired but there's no associated Ingress then a manual attempt will need to be done viacmctl-cmctl renew gitlab-gitlab-tls- and then the above workaround applied. For a list of failed cert renewals usekubectl get certificate, where any that aren't ready have likely failed and need to be triggered again manually.
OR
Follow the workaround as described here - #4866 (comment 1470202853)
Configuration used
This issue happened on Staging Ref - https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102
Click to expand
USER-SUPPLIED VALUES:
certmanager:
install: true
certmanager-issuer:
email: 1.1.1.1@gitlab.com
gitlab:
gitlab-exporter:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-exporter
metrics:
enabled: true
serviceMonitor:
enabled: true
gitlab-shell:
common:
labels:
deployment: gitlab-shell
type: git
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-shell
tag: 16-2-202307121100-11242f88246
metrics:
enabled: true
sshDaemon: gitlab-sshd
kas:
metrics:
enabled: true
serviceMonitor:
enabled: true
migrations:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-toolbox-ee
psql:
host: 1.1.1.1
password:
key: password
secret: gitlab-postgres-password
port: "5432"
sidekiq:
common:
labels:
type: sidekiq
concurrency: "20"
extraEnv:
CUSTOMER_PORTAL_URL: https://customers.staging-ref.gitlab.com
health_checks:
port: 8092
hpa:
cpu:
targetAverageValue: 700m
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-sidekiq-ee
maxReplicas: "8"
metrics:
enabled: true
podMonitor:
enabled: true
minReplicas: "6"
nodeSelector:
workload: sidekiq
resources:
limits:
memory: 4G
requests:
cpu: "0.9"
memory: 2G
tolerations:
- effect: NoSchedule
key: workload
operator: Equal
value: sidekiq
toolbox:
backups:
objectStorage:
backend: gcs
config:
gcpProject: gitlab-staging-ref
key: key
secret: gitlab-backups-object-storage-key
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-toolbox-ee
webservice:
common:
labels:
type: web
extraEnv:
CUSTOMER_PORTAL_URL: https://customers.staging-ref.gitlab.com
PUMA_EXTERNAL_METRICS_SERVER: "true"
hpa:
cpu:
targetAverageValue: 1600m
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee
ingress:
proxyBodySize: 0
maxReplicas: "3"
metrics:
enabled: true
serviceMonitor:
enabled: true
minReplicas: "2"
nodeSelector:
workload: webservice
pod:
labels:
deployment: web
resources:
limits:
memory: 5.25G
requests:
cpu: "4"
memory: 5G
serviceLabels:
railsPromJob: gitlab-rails
workhorsePromJob: gitlab-workhorse
tolerations:
- effect: NoSchedule
key: workload
operator: Equal
value: webservice
workerProcesses: "4"
workhorse:
image: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-workhorse-ee
metrics:
enabled: true
serviceMonitor:
enabled: true
monitoring:
exporter:
enabled: true
gitlab-runner:
install: false
global:
appConfig:
artifacts:
bucket: staging-ref-3k-hybrid-us-artifacts
backups:
bucket: staging-ref-3k-hybrid-us-backups
dependencyProxy:
bucket: staging-ref-3k-hybrid-us-dependency-proxy
externalDiffs:
bucket: staging-ref-3k-hybrid-us-mr-diffs
when: outdated
lfs:
bucket: staging-ref-3k-hybrid-us-lfs
object_store:
connection:
key: key
secret: gitlab-object-storage-key
enabled: true
omniauth:
allowSingleSignOn: true
blockAutoCreatedUsers: false
enabled: true
providers:
- key: provider
secret: gitlab-group-saml-secret
- key: provider
secret: gitlab-google-oauth2-secret
packages:
bucket: staging-ref-3k-hybrid-us-packages
terraformState:
bucket: staging-ref-3k-hybrid-us-terraform-state
uploads:
bucket: staging-ref-3k-hybrid-us-uploads
certificates:
customCAs: null
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/certificates
common:
labels:
shard: default
stage: main
tier: sv
geo:
enabled: true
nodeName: staging-ref-us
psql:
host: ""
password:
key: password
secret: gitlab-geo-tracking-postgres-password
port: "5431"
registry:
replication:
enabled: true
role: primary
gitaly:
authToken:
key: password
secret: gitlab-praefect-external-token
enabled: false
external:
- hostname: 1.1.1.1
name: default
port: 2305
gitlabBase:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-base
gitlabVersion: 16-2-202307121100-11242f88246
grafana:
enabled: false
hosts:
domain: staging-ref.gitlab.com
externalIP: 1.1.1.1
gitlab:
name: staging-ref.gitlab.com
https: true
registry:
name: registry.staging-ref.gitlab.com
image:
pullSecrets:
- name: gitlab-imagepull-secret
ingress:
configureCertmanager: true
initialRootPassword:
key: password
secret: gitlab-initial-root-password
kubectl:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/kubectl
minio:
enabled: false
nodeSelector:
workload: support
psql:
host: 1.1.1.1
load_balancing:
hosts:
- 1.1.1.1
- 1.1.1.1
- 1.1.1.1
password:
key: password
secret: gitlab-postgres-password
port: "6432"
railsSecrets:
secret: gitlab-rails-secrets
redis:
auth:
key: password
secret: gitlab-redis-password
host: gitlab-redis
sentinels:
- host: 1.1.1.1
port: "26379"
- host: 1.1.1.1
port: "26379"
- host: 1.1.1.1
port: "26379"
registry:
bucket: staging-ref-3k-hybrid-us-registry
enabled: true
notificationSecret:
key: secret_token
secret: gitlab-registry-notification
shell:
authToken:
key: password
secret: gitlab-shell-token
port: 22
nginx-ingress:
controller:
config:
use-forwarded-headers: true
labels:
deployment: gitlab-nginx
shard: default
stage: main
type: nginx
metrics:
enabled: true
serviceMonitor:
enabled: true
podLabels:
deployment: gitlab-nginx
shard: default
stage: main
type: nginx
scope:
enabled: false
service:
labels:
deployment: gitlab-nginx
shard: default
stage: main
type: nginx
postgresql:
install: false
prometheus:
install: false
redis:
install: false
registry:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-container-registry
storage:
extraKey: key.json
key: config
secret: gitlab-container-registry-object-storage-key
shared-secrets:
selfsign:
image:
repository: dev.gitlab.org:5005/gitlab/charts/components/images/cfssl-self-sign
Current behavior
E0711 14:55:31.499023 1 sync.go:190] cert-manager/challenges
"msg"="propagation check failed"
"error"="did not get expected response when querying endpoint, expected "RDlEcuOaAJxDf2DXLKtfvrFTm8EMfwl1wjoE5dhSFI0.iKSAZc7cq9KwNgxHJY_e7heArn30yO1eyfMr8QPdWkY"
but got: \n<html cl... (truncated)"
"dnsName"="staging-ref.gitlab.com"
"resource_kind"="Challenge"
"resource_name"="gitlab-gitlab-tls-pbvbk-1921244367-3983242079"
"resource_namespace"="default"
"resource_version"="v1"
"type"="HTTP-01"
Expected behavior
Certificates are updated without errors when using default Charts values.
Versions
- Chart: latest stable (7.1.2)
- Platform:
- Cloud: GKE
- Self-hosted: (OpenShift | Minikube | Rancher RKE | ?)
- Kubernetes: (
kubectl version)- Client:
- Server: | 1.24.14-gke.1200 |
Relevant logs
https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102