Identify useful diagnostic reports
In #362900 (closed) we are looking to collect diagnostic reports from production Pumas. At the least we should collect a Ruby heap dump, but additional data might be required to triage issues offline.
I suggest that groupmemory pairs with an SRE to perform a one-off manual session where such data is pulled, so we can decide what is actually useful.
We can focus on reports that are readily available simply by SSH-ing into a running node:
- Ruby heap dumps (via
rbtrace -p <pid> --heapdump
) - Process maps (via
pmap <pid>
) -
sigdump
reports (viakill -CONT <pid>
) -- this may have to be enabled in production first
Edited by Matthias Käppler