Memory watchdog should restart high-memory workers

We found in #365950 (closed) that reaping workers purely on high heap fragmentation is useful only for certain parts of our production fleet.

To replace puma-worker-killer, I think we should look at other metrics that indicate "bad behavior". We should likely not use RSS, and much less so a fixed budget that we need to maintain (this has caused both confusion and extra work in the past).

Some ideas:

Reap workers based on relative growth. For instance, we could capture master RSS prior to forking. If the watchdog observes the process exceeds some multiple of master RSS, it will issue a kill. This has the benefit that it places a cap on RSS, but it is a relative budget and scales over time.
Reap workers based on suspected memory leaks. We have instances where we allocate millions of objects that are never freed again. This happens in response to requests, not during application start. We could have the watchdog observe live slot growth, and restart workers if this grows steadily over a period of time.

Edited Aug 08, 2022 by Matthias Käppler