Skip to content
Snippets Groups Projects

Add heap dump diagnostic report implementation

Merged Matthias Käppler requested to merge 370077-worker-heap-dumps into master
All threads resolved!

What does this MR do and why?

Refs #370077 (closed)

This is the last step in a sequence of MRs that provide us with the ability to collect object space ("heap") dumps from worker processes.

Specifically, this builds on the following MRs (not including refactors):

  • Add a new life-cycle hook on_worker_stop that is called when a Puma or Sidekiq worker is about to shut down (!103372 (merged))
  • Wiring: Leverage memory-watchdog to signal the worker that it should dump ObjectSpace before shutting down (!103957 (merged)). The actual implementation was just a dummy.
  • Compress reports when streaming to disk: !105115 (merged)
  • This MR: Fill in the method body of the HeapDump report to actually produce a heap dump and puts this behind an ops toggle (we want to enable/disable this selectively.)

Note that this has no changelog entry because:

  • It is a SaaS-only feature currently and guarded by an extra environment switch
  • Uses an ops toggle that defaults to off

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

  1. Follow the instructions in https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md
  2. Set Feature.enable(:report_heap_dumps)
  3. Wait or force a worker to violate a memory threshold; this should trigger a shutdown and trigger this report.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370077 (closed)

Edited by Matthias Käppler

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • added 1 commit

    • fbdcd17e - Verify dump_all isn't called when inactive

    Compare with previous version

  • Aleksei Lipniagov approved this merge request

    approved this merge request

  • added 1 commit

    Compare with previous version

  • Nikola Milojevic
  • Nikola Milojevic approved this merge request

    approved this merge request

  • :wave: @nmilojevic1, thanks for approving this merge request.

    This is the first time the merge request is approved. To ensure full test coverage, a new pipeline will be started shortly.

    For more info, please refer to the following links:

  • Nikola Milojevic resolved all threads

    resolved all threads

  • Nikola Milojevic enabled an automatic merge when the pipeline for 413ac58a succeeds

    enabled an automatic merge when the pipeline for 413ac58a succeeds

  • mentioned in commit 63789a8e

  • mentioned in issue #385175 (closed)

  • added workflowstaging label and removed workflowcanary label

  • Stan Hu mentioned in issue #423894 (closed)

    mentioned in issue #423894 (closed)

  • Please register or sign in to reply
    Loading