Skip to content

Fix Prometheus connection error in usage ping

What does this MR do?

There are about 40% topology usage ping received PrometheusClient::ConnectionError.

With some investigation, we suspect there are two major reasons:

  • API call connection scheme does not match TLS configuration. Either server enforces TLS but we connect using HTTP, or server does not enable TLS but we connect with HTTPS.
  • Prometheus server is TLS enforced, but the Sidekiq node is NOT configured to trust the Prometheus node certificate

A summary is available at #235739 (comment 405245426)

This MR implements solutions for the above two reasons respectively:

  • automatically detect the scheme: doing a ready check for HTTPS connection first and then try HTTP
  • use option verify: false in with_prometheus_client call. This will skip the SSL certificate verification. The security risk is evaluated to be acceptable: #235739 (comment 406218403)

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Closes #235739 (closed)

Edited by Qingyu Zhao

Merge request reports