Add TLS settings for sidekiq exporter/metrics endpoint
What does this MR do?
Related issues
Closes #3369 (closed)
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion.
Required
-
Merge Request Title and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com -
When ready for review, MR is labeled "~workflow::ready for review" per the Distribution MR workflow
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes -
Documentation created/updated -
Tests added -
Integration tests added to GitLab QA -
Equivalent MR/issue for omnibus-gitlab opened -
Validate potential values for new configuration settings. Formats such as integer 10
, duration10s
, URIscheme://user:passwd@host:port
may require quotation or other special handling when rendered in a template and written to a configuration file.
Test plan
Certificate Common Name (CN)/Subject Alternative Name (SAN) Caveats
With the Prometheus pod scrape configuration and service endpoint configuration - Prometheus will auto-discover the pod endpoints to scrape (Prometheus Kubernetes service discovery documentation), but it will point directly to the IP address allocated to the Pod (e.g. https://<pod ip>:<metrics.port>/<metrics.path>
). Due to it being essentially impossible to generate a certificate with the Pod IP in the SAN extension for the cert, this ends up resulting with Prometheus showing an target error similar to the following:
Get "https://10.42.0.34:3807/metrics": x509: cannot validate certificate for 10.42.0.34 because it doesn't contain any IP SANs
Prometheus does support a tls_config.server_name
setting ( documentation ) that it will use to match against the CN/SANs in the certificate - but unfortunately per this open issue - that server_name
setting cannot be set dynamically using the relabelling facility for the service discovery
The certs generated for these endpoints must have a CN or SAN entry that matches the tls_config.server_name
string configured for Prometheus.
Configuring the included Prometheus chart dependency
See !2671 (merged) for an examples/prometheus/values-tls.yaml
set of value overrides - this example presumes:
- All tls-protected pod endpoints for the scraped metrics will be issued by the same Certificate Authority (CA) - and that it's not included in the base
ca-certificates.crt
for the Prometheus container image. - That CA cert is created as a secret named "metrics.gitlab.tls-ca" ( e.g.
kubectl create secret generic --namespace=gitlab metrics.gitlab.tls-ca --from-file=metrics.gitlab.tls-ca=./ca.pem
) with akey:value
ofmetrics.gitlab.tls-ca
and the CA cert - All tls certificates created for the tls-protected pod endpoints will include a SAN of
metrics.gitlab
Due to limitations with helm merging multiple values files - all of the chart's default set of serverFiles.prometheus.yml.scrape_configs
have to be duplicated in this example overrides. The only difference from the default set of scrape configs is this configuration added to the kubernetes-pods
job:
46a35,37
> tls_config:
> ca_file: /etc/ssl/certs/metrics.gitlab.tls-ca
> server_name: metrics.gitlab
This sets the ca_file
to the secret mounted as part of the extraSecretMounts:
and defines an arbitrary string to match to a SAN extension entry
Testing
- Create a certificate for both the sidekiq and webservice metrics endpoints - each with a SAN of
metrics.gitlab
- I used cfssl for my testing, as I had an existing custom CA
- Example config for my sidekiq cert:
$ cat csr_sidekiq_metrics.gitlab.json { "hosts": [ "metrics.gitlab", "sidekiq.metrics.gitlab" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "US", "L": "Hillsborough", "O": "ROOT ORGANIZATION", "OU": "THE ISSUING AUTHORITY", "ST": "North Carolina" } ] }
- Add the certs to Kubernetes (the default names are
RELEASE-sidekiq-metrics-tls
andRELEASE-webservice-metrics-tls
for sidekiq and webservice respectively):kubectl create secret tls --namespace=gitlab gitlab-sidekiq-metrics-tls --cert=./sidekiq.metrics.gitlab.pem --key=./sidekiq.metrics.gitlab-key.pem
kubectl create secret tls --namespace=gitlab gitlab-webservice-metrics-tls --cert=./webservice.metrics.gitlab.pem --key=./webservice.metrics.gitlab-key.pem
- Add the CA to Kubernetes as
metrics.gitlab.tls-ca
kubectl create secret generic --namespace=gitlab metrics.gitlab.tls-ca --from-file=metrics.gitlab.tls-ca=./ca.pem
- Deploy the chart - with the prometheus overrides:
helm upgrade --install gitlab . --timeout 600s --namespace=gitlab \ --set global.image.pullPolicy=Always \ --set certmanager-issuer.email=jyoung@gitlab.com \ --set gitlab.sidekiq.metrics.enabled=true \ --set gitlab.sidekiq.metrics.tls.enabled=true \ --set gitlab.webservice.metrics.tls.enabled=true \ --set gitlab.webservice.tls.enabled=true \ -f examples/prometheus/values-tls.yaml --debug
- You can validate that the endpoints are using TLS using
curl
from the toolbox container (the CA is not mounted in the toolbox, hence the use of--insecure
here):-
git@gitlab-toolbox-67f87cd84f-9r6r8:/$ curl --insecure --verbose --head https://10.42.0.34:3807/metrics
.* Trying 10.42.0.34:3807... * Connected to 10.42.0.34 (10.42.0.34) port 3807 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /etc/ssl/certs/ca-certificates.crt * CApath: /etc/ssl/certs * TLSv1.3 (OUT), TLS handshake, Client hello (1): * TLSv1.3 (IN), TLS handshake, Server hello (2): * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8): * TLSv1.3 (IN), TLS handshake, Certificate (11): * TLSv1.3 (IN), TLS handshake, CERT verify (15): * TLSv1.3 (IN), TLS handshake, Finished (20): * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1): * TLSv1.3 (OUT), TLS handshake, Finished (20): * SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 * ALPN, server did not agree to a protocol * Server certificate: * subject: C=US; ST=North Carolina; L=Hillsborough; O=ROOT ORGANIZATION; OU=THE ISSUING AUTHORITY * start date: Jul 6 20:05:00 2022 GMT * expire date: Jul 6 20:05:00 2023 GMT * issuer: C=US; ST=North Carolina; L=Hillsborough; O=ROOT ORGANIZATION; OU=THE ISSUING AUTHORITY; CN=Jayo Emporium And Fine Certificate Authority * SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. > HEAD /metrics HTTP/1.1 > Host: 10.42.0.34:3807 > User-Agent: curl/7.74.0 > Accept: */*
-
- You can also port-forward the Prometheus service - and see the Prometheus targets are scraping
https
endpoints for thewebservice
andsidekiq
pods: