Skip to content

Add TLS settings for sidekiq exporter/metrics endpoint

Jason Young requested to merge 3369-expose-tls-for-rails-metric-exporters into master

What does this MR do?

Related issues

Closes #3369 (closed)

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • Merge Request Title and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com
  • When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for omnibus-gitlab opened
  • Validate potential values for new configuration settings. Formats such as integer 10, duration 10s, URI scheme://user:passwd@host:port may require quotation or other special handling when rendered in a template and written to a configuration file.

Test plan

Certificate Common Name (CN)/Subject Alternative Name (SAN) Caveats

With the Prometheus pod scrape configuration and service endpoint configuration - Prometheus will auto-discover the pod endpoints to scrape (Prometheus Kubernetes service discovery documentation), but it will point directly to the IP address allocated to the Pod (e.g. https://<pod ip>:<metrics.port>/<metrics.path>). Due to it being essentially impossible to generate a certificate with the Pod IP in the SAN extension for the cert, this ends up resulting with Prometheus showing an target error similar to the following:

Get "https://10.42.0.34:3807/metrics": x509: cannot validate certificate for 10.42.0.34 because it doesn't contain any IP SANs

Prometheus does support a tls_config.server_name setting ( documentation ) that it will use to match against the CN/SANs in the certificate - but unfortunately per this open issue - that server_name setting cannot be set dynamically using the relabelling facility for the service discovery

The certs generated for these endpoints must have a CN or SAN entry that matches the tls_config.server_name string configured for Prometheus.

Configuring the included Prometheus chart dependency

See !2671 (merged) for an examples/prometheus/values-tls.yaml set of value overrides - this example presumes:

  1. All tls-protected pod endpoints for the scraped metrics will be issued by the same Certificate Authority (CA) - and that it's not included in the base ca-certificates.crt for the Prometheus container image.
  2. That CA cert is created as a secret named "metrics.gitlab.tls-ca" ( e.g. kubectl create secret generic --namespace=gitlab metrics.gitlab.tls-ca --from-file=metrics.gitlab.tls-ca=./ca.pem) with a key:value of metrics.gitlab.tls-ca and the CA cert
  3. All tls certificates created for the tls-protected pod endpoints will include a SAN of metrics.gitlab

Due to limitations with helm merging multiple values files - all of the chart's default set of serverFiles.prometheus.yml.scrape_configs have to be duplicated in this example overrides. The only difference from the default set of scrape configs is this configuration added to the kubernetes-pods job:

46a35,37
>           tls_config:
>             ca_file: /etc/ssl/certs/metrics.gitlab.tls-ca
>             server_name: metrics.gitlab

This sets the ca_file to the secret mounted as part of the extraSecretMounts: and defines an arbitrary string to match to a SAN extension entry

Testing

  1. Create a certificate for both the sidekiq and webservice metrics endpoints - each with a SAN of metrics.gitlab
    • I used cfssl for my testing, as I had an existing custom CA
    • Example config for my sidekiq cert:
      $ cat csr_sidekiq_metrics.gitlab.json
      {
         "hosts": [
           "metrics.gitlab",
           "sidekiq.metrics.gitlab"
          ],
          "key": {
            "algo": "rsa",
            "size": 2048
          },
          "names": [
            {
              "C": "US",
              "L": "Hillsborough",
              "O": "ROOT ORGANIZATION",
              "OU": "THE ISSUING AUTHORITY",
              "ST": "North Carolina"
            }
          ]
       }
  2. Add the certs to Kubernetes (the default names are RELEASE-sidekiq-metrics-tls and RELEASE-webservice-metrics-tls for sidekiq and webservice respectively):
    • kubectl create secret tls --namespace=gitlab gitlab-sidekiq-metrics-tls --cert=./sidekiq.metrics.gitlab.pem --key=./sidekiq.metrics.gitlab-key.pem
    • kubectl create secret tls --namespace=gitlab gitlab-webservice-metrics-tls --cert=./webservice.metrics.gitlab.pem --key=./webservice.metrics.gitlab-key.pem
  3. Add the CA to Kubernetes as metrics.gitlab.tls-ca
    • kubectl create secret generic --namespace=gitlab metrics.gitlab.tls-ca --from-file=metrics.gitlab.tls-ca=./ca.pem
  4. Deploy the chart - with the prometheus overrides:
    helm upgrade --install gitlab . --timeout 600s --namespace=gitlab \
    --set global.image.pullPolicy=Always \
    --set certmanager-issuer.email=jyoung@gitlab.com \
    --set gitlab.sidekiq.metrics.enabled=true \
    --set gitlab.sidekiq.metrics.tls.enabled=true \
    --set gitlab.webservice.metrics.tls.enabled=true \
    --set gitlab.webservice.tls.enabled=true \
    -f examples/prometheus/values-tls.yaml --debug
  5. You can validate that the endpoints are using TLS using curl from the toolbox container (the CA is not mounted in the toolbox, hence the use of --insecure here):
    • git@gitlab-toolbox-67f87cd84f-9r6r8:/$ curl --insecure --verbose --head https://10.42.0.34:3807/metrics.
       *   Trying 10.42.0.34:3807...
       * Connected to 10.42.0.34 (10.42.0.34) port 3807 (#0)
       * ALPN, offering h2
       * ALPN, offering http/1.1
       * successfully set certificate verify locations:
       *  CAfile: /etc/ssl/certs/ca-certificates.crt
       *  CApath: /etc/ssl/certs
       * TLSv1.3 (OUT), TLS handshake, Client hello (1):
       * TLSv1.3 (IN), TLS handshake, Server hello (2):
       * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
       * TLSv1.3 (IN), TLS handshake, Certificate (11):
       * TLSv1.3 (IN), TLS handshake, CERT verify (15):
       * TLSv1.3 (IN), TLS handshake, Finished (20):
       * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
       * TLSv1.3 (OUT), TLS handshake, Finished (20):
       * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
       * ALPN, server did not agree to a protocol
       * Server certificate:
       *  subject: C=US; ST=North Carolina; L=Hillsborough; O=ROOT ORGANIZATION; OU=THE ISSUING AUTHORITY
       *  start date: Jul  6 20:05:00 2022 GMT
       *  expire date: Jul  6 20:05:00 2023 GMT
       *  issuer: C=US; ST=North Carolina; L=Hillsborough; O=ROOT ORGANIZATION; OU=THE ISSUING AUTHORITY; CN=Jayo Emporium And Fine Certificate Authority
       *  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
       > HEAD /metrics HTTP/1.1
       > Host: 10.42.0.34:3807
       > User-Agent: curl/7.74.0
       > Accept: */*
  6. You can also port-forward the Prometheus service - and see the Prometheus targets are scraping https endpoints for the webservice and sidekiq pods: prometheus-scrape-targets
Edited by Jason Plum

Merge request reports