Reduce redundancy in Sidekiq process management

As discovered in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11332, the sidekiq "processes" set cleanup happens inline in the ProcessSet initializer, which happens when anything wants to get a count of running processes, which causes an hget for each process running in the entire sidekiq system.

That happens during the Scheduled Job poller when it's determining how long to wait before running again; empirically this is happening every 15-30 seconds per sidekiq process, and we have hundreds of these with more coming on line as we migrate to kubernetes and as load grows, which means we are doing this same work potentially 10s of times a second, causing thousands of hgets per second.

Ironically, the precise process count isn't actually that important. In Sidekiq::Scheduled::Poller#random_poll_interval it is only checking if the number is more or less than 10 as a heuristic for calculating the wait time. Given the heartbeat is 5 seconds this is already lightly inaccurate, although processes actually failing in a way that the heartbeat cleanup needs to deal with is something I expect is rare in the extreme.

All of which is to say: if we only did this cleanup once every few seconds, e.g. in a separate thread rather than in-line in ProcessSet, we'd still have sufficient accuracy while reducing the hget calls by 1-2 orders of magnitude. See some specific implementation suggestions in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11332#note_412579785 although they should not be considered exhaustive; there may be other solutions which achieve the same goal. This will need to be an upstream PR for sidekiq eventually, although we may like to monkey-patch it into our code base initially, to conclusively prove the benefit (graphs will be particularly compelling).

Edited Sep 17, 2020 by Craig Miskell