Skip to content

Sidekiq metrics server exports own metrics

Matthias Käppler requested to merge 344629-metrics-server-metrics into master

What does this MR do and why?

See #344629 (closed)

As part of &6409 (closed), we moved application metrics for Sidekiq (and in the future, Puma), from an in-worker server to a dedicated metrics server process with its own life-cycle.

This is beneficial for observability as well, since now we can export metrics about the metrics server itself, where previously, they would simply be part of the overall metrics a Sidekiq worker would report.

With this MR, we're starting simple by exporting everything that is collected by RubySampler, i.e. metrics important for resource utilization (especially memory use) for the metrics server itself.

Note to reviewers: With the recent introduction of uncoverage, touching some of the existing files disclosed testing gaps, which I filled. Those changes are necessary to make the build pass, but are unrelated to the MR's intent.

Screenshots or screen recordings

The metric-server metrics are exported with pid=*_exporter, so that we can distinguish them from actual worker metrics. For sidekiq, it will be sidekiq_exporter.

Screenshot_from_2021-12-14_10-38-41

How to set up and validate locally

GDK

Testing this in a fully integrated fashion with the GDK requires an additional MR, which hasn't been merged yet: gitlab-development-kit!2315 (merged)

However, to emulate this you could start the metrics-server via the bin/metrics-server script alongside sidekiq:

  1. Start Sidekiq; this should start writing sidekiq worker metrics to tmp/prometheus_multiproc_dir/sidekiq
  2. Run METRICS_SERVER_TARGET=sidekiq bin/metrics-server; this will launch the server and serve metrics from the dir
  3. Run curl localhost:<metrics_port>/metrics; it should serve Prometheus metrics for both sidekiq workers and itself (e.g. grep for sidekiq_exporter and sidekiq_0, there should be ruby_* metrics for either of them

There is a chance you might run into a port collision, I haven't used the approach above myself.

GCK

With GCK, this should work:

  1. Set the sidekiq_health_check port to something different from metrics; this will signal the server to boot; in gck.yml, add:
    gitlab.yml:
      development:
        monitoring:
          sidekiq_health_checks:
            port: 3907
  2. Run make up-sidekiq up-prometheus
  3. In your browser, go to localhost:9090, which is the Prometheus web UI
  4. Search for any ruby_* metrics and verify it exists for both sidekiq_0 and sidekiq_exporter

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #344629 (closed)

Edited by Matthias Käppler

Merge request reports