Improve ability to debug Unicorn problems

Problem to solve

Improve the ability to debug Unicorn problems.

Target audience

Further details

Unicorn can be a bottleneck for performance and reliability of a GitLab instance.

Common problems:

  • OOM conditions
  • CPU saturation/starvation
  • Too few workers causing request queuing

Proposal

Add additional metrics to track Unicorn performance:

  • process_start_time_seconds{worker="ID"} - Gather the per-worker start time from /proc/$PID/stat.
  • process_cpu_seconds_total{worker="ID"} - Gather the per-worker CPU time from /proc/$PID/stat.
  • process_max_fds{worker="ID"} - How many FDs are available to the process.
  • unicorn_workers - The number of running unicorn workers.

What does success look like, and how can we measure that?

We have new metrics available for monitoring.

Links / references

Edited by silv