Extract report logic into Reporter class (!104727) · Merge requests · GitLab.org / GitLab

Matthias Käppler requested to merge 370077-iterate-reporter into master Nov 22, 2022

What does this MR do and why?

We recently added the ability to collect runtime diagnostics dumps from gitlab-rails and upload these to GCS. For the MVC, we only had one such report, which was running on a timer.

Because we only had a single report, it wasn't entirely clear at the time which parts of this system would need to be reused for multiple reports, so in the spirit of iteration we didn't overthink this. Now that we are looking to add a second kind of report, heap dumps, some cracks started to show in these abstractions.

In order to streamline logic to handle multiple kinds of reports and the signals that trigger them, I made the following changes:

The actual Report types are very light-weight now and conform to a duck-typed API: they have a name, and a run method that accepts a writer into which they emit their data. This will most likely be some kind of IO.
The ReportsDaemon, a timer thread that was used to trigger the original report is now greatly slimmed down and any remaining logic not related to timers has been moved to the Reporter class. This is a direct follow-up to !104264 (merged), where I had started doing this.
The Reporter type is now responsible for providing a writer to each report; this means all logic from the original Jemalloc report that involved streaming data to a temp file has moved here since we will re-use this later to produce heap dumps.

In summary, the old flow was:

ReportsDaemon
  -> trigger N reports based on a timer
  -> ask each report where they wrote their data
  Report(s):
    -> stream data to a temp dir using a self-chosen path
    -> return this path to the caller
  -> move report data from that path to an upload queue/directory

The new flow is:

ReportsDaemon
  -> creates a new `Reporter`
  Reporter
    -> prepare a temp dir where reports are written
    -> construct a report file writer
    -> hand this ready-to-use writer to each report
    Report(s)
      -> stream data into the writer; that's it
    -> move report data from that path to an upload queue/directory

Plus, the Reporter can also be used outside of the ReportsDaemon as is the case with HeapDump, which is not triggered by a timer but rather an on-process-stop signal.

Behavioral changes

While this is largely a refactor, I had to make a few behavioral changes because Reporter was a new class while ReportsDaemon had some behavior like error handling baked in already. Changes I made:

Errors are now caught in Reporter#run_report and logged.
I added a started log event; I find that these can be useful in case a thread or process gets stuck, in which case there would never be a finished or failed event.
The finished log event now contains the full path to the report file.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

How to set up and validate locally

See https://gitlab.com/gitlab-org/application-performance-team/team-tools/-/blob/master/DIAGNOSTIC_REPORTS.md

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Related to #370077 (closed)

Edited Nov 24, 2022 by Matthias Käppler

Extract report logic into Reporter class

What does this MR do and why?

Behavioral changes

Screenshots or screen recordings

How to set up and validate locally

MR acceptance checklist

Merge request reports