Cleanup stale Prometheus metrics

Summary

Stale Prometheus metrics are not removed on Unicorn start-up. Only on restarts.

Stale Prometheus metrics are not removed if envvar prometheus_multiproc_dir is missing which is true for e.g. development environment.

Steps to reproduce

  • Remove stale metrics (rm -fr tmp/prometheus_multiproc_dir/*.db)
  • Start Unicorn (e.g via GDK gdk run)
  • Visit http://127.0.0.1/-/metrics. http_requests_total{method="get"} should read 2 (actually why not 1 ❓)
  • Reload http://127.0.0.1/-/metrics. http_requests_total increases
  • Stop Unicorn
  • Start Unicorn
  • Visit http://127.0.0.1/-/metrics. http_requests_total is not reset

What is the current bug behavior?

Stale Prometheus metrics are not removed

  • after starting Unicorn
  • in development environment

What is the expected correct behavior?

Stale Prometheus metrics are removed

  • after Unicorn has been started
  • in development environment

Possible fixes

  • Delete stale metrics also in the on_worker_start callback.

  • Use Prometheus::Client.configuration.multiprocess_files_dir instead of ENV['prometheus_multiproc_dir'] when deleting stale metrics

  unless Rails.env.test?
    prometheus_multiproc_dir = Prometheus::Client.configuration.multiprocess_files_dir
    old_metrics = Dir[File.join(prometheus_multiproc_dir, '*.db')]
    FileUtils.rm_rf(old_metrics)
  end

/cc @bjk-gitlab

Assignee Loading
Time tracking Loading