Skip to content

Reinstate Labkit::Metrics and fixes locking pattern

What does this MR do and why?

Resolves #549420 (closed).

Relates to gitlab-com/gl-infra/observability/team#4160 (closed).

The Labkit::Metrics had been introduced in the MR !188302 (merged), behind an environment variable flag (LABKIT_METRICS_ENABLED). However, there was bug hidden in there causing a deadlock.

The specific commit fixing the issue is here. Explanation:

  1. safe_provide_metric acquires PROVIDER_MUTEX.synchronize
  2. It calls provide_metric(metric_type, metric_name, *args)
  3. registry.get(metric_name) returns nil (metric doesn't exist)
  4. The || operator triggers registry.method(metric_type).call(metric_name, *args)
  5. The registry call goes through Redis instrumentation, triggered by Gitlab::CurrentSettings.prometheus_metrics_enabled
  6. The instrumentation tries to record its own metrics, calling safe_provide_metric again, which tries to acquire the same mutex that's already held

This issue was only present in Gitlab::Metrics::Prometheus. The Gitlab::Metrics::Labkit implementation is not affected by that.

This bug has been introduced due to an attempt of simplifying the current implementation, as a step to replace it by Labkit::Metrics, however, this edge case of recursive call wasn't clear given the hidden call to the metrics library from a instrumentation object call within a method call from the model.

The implementation of safe_provide_metric is now the same as before the merge.

References

Previous MR with the revert: !194383 (merged)

Screenshots or screen recordings

Before After

How to set up and validate locally

I verified this change by running this with & without the revert:

RAILS_ENV=test bin/rails db:reset
RAILS_ENV=test bin/rails db:migrate

Same steps as reproduced here: !194383 (merged)

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Hercules Merscher

Merge request reports

Loading