Skip to content
Snippets Groups Projects

Add heap dump diagnostic report implementation

Merged Matthias Käppler requested to merge 370077-worker-heap-dumps into master

What does this MR do and why?

Refs #370077 (closed)

This is the last step in a sequence of MRs that provide us with the ability to collect object space ("heap") dumps from worker processes.

Specifically, this builds on the following MRs (not including refactors):

  • Add a new life-cycle hook on_worker_stop that is called when a Puma or Sidekiq worker is about to shut down (!103372 (merged))
  • Wiring: Leverage memory-watchdog to signal the worker that it should dump ObjectSpace before shutting down (!103957 (merged)). The actual implementation was just a dummy.
  • Compress reports when streaming to disk: !105115 (merged)
  • This MR: Fill in the method body of the HeapDump report to actually produce a heap dump and puts this behind an ops toggle (we want to enable/disable this selectively.)

Note that this has no changelog entry because:

  • It is a SaaS-only feature currently and guarded by an extra environment switch
  • Uses an ops toggle that defaults to off

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

  1. Follow the instructions in https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md
  2. Set Feature.enable(:report_heap_dumps)
  3. Wait or force a worker to violate a memory threshold; this should trigger a shutdown and trigger this report.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370077 (closed)

Edited by Matthias Käppler

Merge request reports

Merged results pipeline #718051744 passed with warnings

Pipeline: GitLab

#718054806

    Pipeline: GitLab

    #718054803

      Pipeline: GitLab

      #718054488

        Merged results pipeline passed with warnings for 880fd146

        Test coverage 85.20% from 2 jobs

        Merged by Nikola MilojevicNikola Milojevic 2 years ago (Dec 9, 2022 12:19pm UTC)

        Loading

        Pipeline #718133037 passed

        Pipeline passed for 63789a8e on master

        Test coverage 72.28% from 2 jobs
        10 environments impacted.

        Activity

        Filter activity
        • Approvals
        • Assignees & reviewers
        • Comments (from bots)
        • Comments (from users)
        • Commits & branches
        • Edits
        • Labels
        • Lock status
        • Mentions
        • Merge request status
        • Tracking
      • added 1 commit

        • fbdcd17e - Verify dump_all isn't called when inactive

        Compare with previous version

      • Aleksei Lipniagov approved this merge request

        approved this merge request

      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Loading
      • Please register or sign in to reply
        Loading