Create separate daemon process for Sidekiq Memory Killer monitoring

Problem to solve

In issue #34547 (closed), user reported a scenario that, Sidekiq Memory Killer failed to kill Sidekiq process as expected.

Sidekiq memory killer thread detected high memory usage
Sidekiq memory killer thread send SIGTSTP and SIGTERM to Sidekiq worker process
Sidekiq worker process received SIGTSTP and SIGTERM
However something is wrong, Sidekiq worker process does not terminate. And memory killer thread did not send SIGKILL as the last resort. (We are not sure the reason why SIGKILL is not sent: maybe memory killer thread is terminated, or maybe memory killer thread never get scheduled/wakeup any more).

There are two issues here:

It is subtle to debug. It is not easy to get much information from user's production environment for root cause analysis. We cannot reproduce it either.
Memory killer thread is not able to send SIGKILL to Sidekiq Worker.

The reason of issue 2), is highly because: Sidekiq memory killer thread itself is one child thread of Sidekiq Worker process. So when master thread has something wrong, child thread may not able to work as the last resort.

The idea is: to fork a new child process to send SIGKILL to Sidekiq Worker. If this forked child process can run independently from Sidekiq worker, it will make sure to send SIGKILL when Sidekiq Worker hang for any reason.

Intended users

Sasha (Software Developer)

Further details

This is to help developer's debugging subtle scenario for Sidekiq memory killer. Customer should not need to care about this tool.

As possible future direction, this tool can be generalised to be a diagnostic information collector for any/all gitlab packages. It could make us easier to support our user.

Proposal

There are several possible options to create the new monitor process:

create a new script sidekiq_monitor. It runs in parallel with sidekiq.
- Advantage: this definitely is independent daemon process that won't be impacted by Sidekiq.
- Dis-advantage: it requires inter-process communication, to know the Sidekiq worker process id, etc
fork a new process from existing Sidekiq memory killer thread.
- Advantage: less code change; easy to retrieve Sidekiq worker process information.
- Dis-advantage: whether this approach work or not, it depends on Sidekiq implementation, not sure whether Sidekiq will terminate all child process. Need to research/try.

Besides send SIGKILL to Sidekiq Worker process, the new monitor process will also collect context(like ps etc) for root cause analysis, and log it. When user encounter issue, we can ask them to provide this log.

Documentation

Testing

This is low risk change, the worst case is: the process is terminated by sidekiq worker, so Sidekiq memory killer behaves the same as current version.

So far, we have not really replicate such a scenario ourselves. But I think we can verify it by:

Try to hook a long running at_exit handler in Sidekiq. To see whether we can reproduce this behaviour
we can manually send SIGTERM to Sidekiq, observe Sidekiq worker process terminated, BUT, the new monitor process should still alive working normally
we can manually send SIGKILL to Sidekiq, observe Sidekiq worker process terminated, BUT, the new monitor process should still alive working normally

What does success look like, and how can we measure that?

If Sidekiq Worker process hangs after it received SIGTERM, the new monitor process will send SIGKILL to kill Sidekiq Worker process.

Links / references

Edited Jul 03, 2025 by 🤖 GitLab Bot 🤖