Add heap dump diagnostic report implementation
What does this MR do and why?
Refs #370077 (closed)
This is the last step in a sequence of MRs that provide us with the ability to collect object space ("heap") dumps from worker processes.
Specifically, this builds on the following MRs (not including refactors):
-
Add a new life-cycle hook on_worker_stop
that is called when a Puma or Sidekiq worker is about to shut down (!103372 (merged)) -
Wiring: Leverage memory-watchdog
to signal the worker that it should dumpObjectSpace
before shutting down (!103957 (merged)). The actual implementation was just a dummy. -
Compress reports when streaming to disk: !105115 (merged) -
This MR: Fill in the method body of the HeapDump
report to actually produce a heap dump and puts this behind anops
toggle (we want to enable/disable this selectively.)
Note that this has no changelog entry because:
- It is a SaaS-only feature currently and guarded by an extra environment switch
- Uses an ops toggle that defaults to
off
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
- Follow the instructions in https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md
- Set
Feature.enable(:report_heap_dumps)
- Wait or force a worker to violate a memory threshold; this should trigger a shutdown and trigger this report.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #370077 (closed)
Edited by Matthias Käppler