Skip to content

Heap dumps: turn gzip errors into structured log events

Matthias Käppler requested to merge 370077-log-gzip-errors into master

What does this MR do and why?

We recently added the ability to capture Ruby object space dumps.

While rolling this out on SaaS, I noticed that sometimes gzip would fail to compress them (likely due to running out of space) so we were uploading empty files instead of bailing out and logging this as a failure. The reason was that we merely forwarded gzip's stderr to the application i.e. container stderr which just dumps a string to the container logs.

With this MR, we capture stderr of that process and map it to an exception, which further up the call stack gets mapped to a log event. This means error handling is now more consistent, regardless of where we fail (application or when shelling out.)

Screenshots or screen recordings

To test this manually, I gave gzip an unsupported command line flag. This produces a log event in application_json.log now:

{
  "severity": "ERROR",
  "time": "2022-12-20T13:40:02.292Z",
  "correlation_id": null,
  "pid": 267,
  "worker_id": "puma_1",
  "perf_report_worker_uuid": "1d0dfa39-cef9-44b6-ab4b-7d77d3c59900",
  "message": "failed",
  "perf_report": "heap_dump",
  "error": "#\u003cStandardError: gzip: unrecognized option '--blah'\u003e"
}

whereas previously, no log event was emitted.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #370077 (closed)

Merge request reports