Skip to content

Add heap dump diagnostic report implementation

Matthias Käppler requested to merge 370077-worker-heap-dumps into master

What does this MR do and why?

Refs #370077 (closed)

This is the last step in a sequence of MRs that provide us with the ability to collect object space ("heap") dumps from worker processes.

Specifically, this builds on the following MRs (not including refactors):

  • Add a new life-cycle hook on_worker_stop that is called when a Puma or Sidekiq worker is about to shut down (!103372 (merged))
  • Wiring: Leverage memory-watchdog to signal the worker that it should dump ObjectSpace before shutting down (!103957 (merged)). The actual implementation was just a dummy.
  • Compress reports when streaming to disk: !105115 (merged)
  • This MR: Fill in the method body of the HeapDump report to actually produce a heap dump and puts this behind an ops toggle (we want to enable/disable this selectively.)

Note that this has no changelog entry because:

  • It is a SaaS-only feature currently and guarded by an extra environment switch
  • Uses an ops toggle that defaults to off

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

  1. Follow the instructions in https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md
  2. Set Feature.enable(:report_heap_dumps)
  3. Wait or force a worker to violate a memory threshold; this should trigger a shutdown and trigger this report.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370077 (closed)

Edited by Matthias Käppler

Merge request reports