Skip to content

Certificate wasn't refreshed in GKE / EKS due to ingress controller changes

Summary

Cert-manager was trying correctly to refresh the cert but in it's standard HTTP check it was always getting redirected to GitLab Webservice instead of the acme web servers, leading to constant failures.

This workaround (removing kubernetes.io/ingress.class and adding spec.ingressClassName) around ingress class annotations fixed the issue and the cert has now been issued.

Additional context

From Slack discussion:

There have been recent changes to the GCE ingress controller behaves, and only recently has that actually accepted ingressClassName in place of the annotation. https://cloud.google.com/kubernetes-engine/docs/concepts/ingress

related to Update example values for Google and Amazon Ing... (!3263 - merged)

a newer version of Cert Manager was released with support for the new field - https://github.com/cert-manager/cert-manager/releases/tag/v1.12.0

Steps to reproduce

  1. Install Charts with SSL
  2. Trigger cert renewal

More details in https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102#note_1467057771

Workarounds

In each cn-acme-http-solver Ingress config delete the kubernetes.io/ingress.class annotation and then set the same config of gitlab-nginx under spec.ingressClassName:

  1. Delete kubernetes.io/ingress.class under metadata.annotations:

    metadata:
      annotations:
        kubernetes.io/ingress.class: gitlab-nginx
  2. Set spec.ingressClassName with the same value:

    spec:
      ingressClassName: gitlab-nginx
  3. Save the config and wait a minute or so. The Ingress should then disappear meaning the cert renewal was successful

  4. If no Ingress is present - Delays between certificate renewal retries by cert-manager will occur if many failed attempts were occurred. If certificate is expired but there's no associated Ingress then a manual attempt will need to be done via cmctl - cmctl renew gitlab-gitlab-tls - and then the above workaround applied. For a list of failed cert renewals use kubectl get certificate, where any that aren't ready have likely failed and need to be triggered again manually.

OR

Follow the workaround as described here - #4866 (comment 1470202853)

Configuration used

This issue happened on Staging Ref - https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102

Click to expand
USER-SUPPLIED VALUES:
certmanager:
  install: true
certmanager-issuer:
  email: 1.1.1.1@gitlab.com
gitlab:
  gitlab-exporter:
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-exporter
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
  gitlab-shell:
    common:
      labels:
        deployment: gitlab-shell
        type: git
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-shell
      tag: 16-2-202307121100-11242f88246
    metrics:
      enabled: true
    sshDaemon: gitlab-sshd
  kas:
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
  migrations:
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-toolbox-ee
    psql:
      host: 1.1.1.1
      password:
        key: password
        secret: gitlab-postgres-password
      port: "5432"
  sidekiq:
    common:
      labels:
        type: sidekiq
    concurrency: "20"
    extraEnv:
      CUSTOMER_PORTAL_URL: https://customers.staging-ref.gitlab.com
    health_checks:
      port: 8092
    hpa:
      cpu:
        targetAverageValue: 700m
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-sidekiq-ee
    maxReplicas: "8"
    metrics:
      enabled: true
      podMonitor:
        enabled: true
    minReplicas: "6"
    nodeSelector:
      workload: sidekiq
    resources:
      limits:
        memory: 4G
      requests:
        cpu: "0.9"
        memory: 2G
    tolerations:
    - effect: NoSchedule
      key: workload
      operator: Equal
      value: sidekiq
  toolbox:
    backups:
      objectStorage:
        backend: gcs
        config:
          gcpProject: gitlab-staging-ref
          key: key
          secret: gitlab-backups-object-storage-key
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-toolbox-ee
  webservice:
    common:
      labels:
        type: web
    extraEnv:
      CUSTOMER_PORTAL_URL: https://customers.staging-ref.gitlab.com
      PUMA_EXTERNAL_METRICS_SERVER: "true"
    hpa:
      cpu:
        targetAverageValue: 1600m
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-webservice-ee
    ingress:
      proxyBodySize: 0
    maxReplicas: "3"
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
    minReplicas: "2"
    nodeSelector:
      workload: webservice
    pod:
      labels:
        deployment: web
    resources:
      limits:
        memory: 5.25G
      requests:
        cpu: "4"
        memory: 5G
    serviceLabels:
      railsPromJob: gitlab-rails
      workhorsePromJob: gitlab-workhorse
    tolerations:
    - effect: NoSchedule
      key: workload
      operator: Equal
      value: webservice
    workerProcesses: "4"
    workhorse:
      image: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-workhorse-ee
      metrics:
        enabled: true
        serviceMonitor:
          enabled: true
      monitoring:
        exporter:
          enabled: true
gitlab-runner:
  install: false
global:
  appConfig:
    artifacts:
      bucket: staging-ref-3k-hybrid-us-artifacts
    backups:
      bucket: staging-ref-3k-hybrid-us-backups
    dependencyProxy:
      bucket: staging-ref-3k-hybrid-us-dependency-proxy
    externalDiffs:
      bucket: staging-ref-3k-hybrid-us-mr-diffs
      when: outdated
    lfs:
      bucket: staging-ref-3k-hybrid-us-lfs
    object_store:
      connection:
        key: key
        secret: gitlab-object-storage-key
      enabled: true
    omniauth:
      allowSingleSignOn: true
      blockAutoCreatedUsers: false
      enabled: true
      providers:
      - key: provider
        secret: gitlab-group-saml-secret
      - key: provider
        secret: gitlab-google-oauth2-secret
    packages:
      bucket: staging-ref-3k-hybrid-us-packages
    terraformState:
      bucket: staging-ref-3k-hybrid-us-terraform-state
    uploads:
      bucket: staging-ref-3k-hybrid-us-uploads
  certificates:
    customCAs: null
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/certificates
  common:
    labels:
      shard: default
      stage: main
      tier: sv
  geo:
    enabled: true
    nodeName: staging-ref-us
    psql:
      host: ""
      password:
        key: password
        secret: gitlab-geo-tracking-postgres-password
      port: "5431"
    registry:
      replication:
        enabled: true
    role: primary
  gitaly:
    authToken:
      key: password
      secret: gitlab-praefect-external-token
    enabled: false
    external:
    - hostname: 1.1.1.1
      name: default
      port: 2305
  gitlabBase:
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-base
  gitlabVersion: 16-2-202307121100-11242f88246
  grafana:
    enabled: false
  hosts:
    domain: staging-ref.gitlab.com
    externalIP: 1.1.1.1
    gitlab:
      name: staging-ref.gitlab.com
    https: true
    registry:
      name: registry.staging-ref.gitlab.com
  image:
    pullSecrets:
    - name: gitlab-imagepull-secret
  ingress:
    configureCertmanager: true
  initialRootPassword:
    key: password
    secret: gitlab-initial-root-password
  kubectl:
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/kubectl
  minio:
    enabled: false
  nodeSelector:
    workload: support
  psql:
    host: 1.1.1.1
    load_balancing:
      hosts:
      - 1.1.1.1
      - 1.1.1.1
      - 1.1.1.1
    password:
      key: password
      secret: gitlab-postgres-password
    port: "6432"
  railsSecrets:
    secret: gitlab-rails-secrets
  redis:
    auth:
      key: password
      secret: gitlab-redis-password
    host: gitlab-redis
    sentinels:
    - host: 1.1.1.1
      port: "26379"
    - host: 1.1.1.1
      port: "26379"
    - host: 1.1.1.1
      port: "26379"
  registry:
    bucket: staging-ref-3k-hybrid-us-registry
    enabled: true
    notificationSecret:
      key: secret_token
      secret: gitlab-registry-notification
  shell:
    authToken:
      key: password
      secret: gitlab-shell-token
    port: 22
nginx-ingress:
  controller:
    config:
      use-forwarded-headers: true
    labels:
      deployment: gitlab-nginx
      shard: default
      stage: main
      type: nginx
    metrics:
      enabled: true
      serviceMonitor:
        enabled: true
    podLabels:
      deployment: gitlab-nginx
      shard: default
      stage: main
      type: nginx
    scope:
      enabled: false
    service:
      labels:
        deployment: gitlab-nginx
        shard: default
        stage: main
        type: nginx
postgresql:
  install: false
prometheus:
  install: false
redis:
  install: false
registry:
  image:
    repository: dev.gitlab.org:5005/gitlab/charts/components/images/gitlab-container-registry
  storage:
    extraKey: key.json
    key: config
    secret: gitlab-container-registry-object-storage-key
shared-secrets:
  selfsign:
    image:
      repository: dev.gitlab.org:5005/gitlab/charts/components/images/cfssl-self-sign

Current behavior

E0711 14:55:31.499023       1 sync.go:190] cert-manager/challenges 
"msg"="propagation check failed" 
"error"="did not get expected response when querying endpoint, expected "RDlEcuOaAJxDf2DXLKtfvrFTm8EMfwl1wjoE5dhSFI0.iKSAZc7cq9KwNgxHJY_e7heArn30yO1eyfMr8QPdWkY" 
but got: \n<html cl... (truncated)" 
"dnsName"="staging-ref.gitlab.com" 
"resource_kind"="Challenge" 
"resource_name"="gitlab-gitlab-tls-pbvbk-1921244367-3983242079"
 "resource_namespace"="default" 
"resource_version"="v1"
 "type"="HTTP-01"

Expected behavior

Certificates are updated without errors when using default Charts values.

Versions

  • Chart: latest stable (7.1.2)
  • Platform:
    • Cloud: GKE
    • Self-hosted: (OpenShift | Minikube | Rancher RKE | ?)
  • Kubernetes: (kubectl version)
    • Client:
    • Server: | 1.24.14-gke.1200 |

Relevant logs

https://gitlab.com/gitlab-org/quality/gitlab-environment-toolkit-configs/staging-ref/-/issues/102

Edited by Grant Young