Certmanager failing to issue multiple certificates randomly

Summary

I've observed certmanager failing to issue certificates for random components when attempting 3 consecutive fresh installs of the chart. First it failed for registry and openbao. Second it only succeeded for kas. Third it only failed for Omnibus.

In all the 3 cases I used the Gateway API. There were some differences:

  1. First I installed everything without openbao, and then updated the chart to enable openbao. In this version OpenBao still had a "bug" which it was deploying an ingress even when I had the gateway api enabled.
  2. Second, I installed openbao from scratch together with all the other chart components.
  3. Third, I disabled the OpenBao chart ingress explicitly.

I suspect that either:

  • Certmanager struggles to create certificates for too many components at the same time.
    • This might have been a thing already before the Gateway API, but might have become more evident with the Gateway API, as I've seen it 3 times in a row.
  • Certmanager might get confused when there are two hosts with the same name (ingress and httproute).

It's possible that the two above are problems.

Versions

OpenBao:

                    app.kubernetes.io/version=v2.4.1-gitlab2
                    helm.sh/chart=openbao-0.12.0

Certmanager:

                        app.kubernetes.io/version=v1.17.4
                        helm.sh/chart=certmanager-v1.17.4

GitLab:

                    chart=webservice-9.7.1

Envoy Gateway:

                        helm.sh/chart=envoy-gateway-1.6.2

Workaround

For the third case, with openbao ingress forcedly disabled, deleting the secrets and certificates to force a reissue was sufficient to fix MinIO certificate.

Follow-up origin

The following discussion from !4728 (merged) should be addressed:

  • @Alexand started a discussion: (+8 comments)

    Test OpenBao access

    Check OpenBao is accessible is failing for me. 🔍 :

    curl -v https://openbao.${DOMAIN}/v1/sys/health
    * Host openbao.gitlab.jcunha.dev:443 was resolved.
    * IPv6: (none)
    * IPv4: REDACTED
    *   Trying REDACTED:443...
    * Connected to openbao.${DOMAIN} (REDACTED) port 443
    * ALPN: curl offers h2,http/1.1
    * (304) (OUT), TLS handshake, Client hello (1):
    *  CAfile: /etc/ssl/cert.pem
    *  CApath: none
    * Recv failure: Connection reset by peer
    * LibreSSL/3.3.6: error:02FFF036:system library:func(4095):Connection reset by peer
    * Closing connection
    curl: (35) Recv failure: Connection reset by peer
Edited by João Alexandre Cunha