`PipelineScheduleWorker` and `ExpireBuildArtifactsWorker` are being killed by Sidekiq Memory killer
Problem
It seems the cron workers PipelineScheduleWorker
and ExpireBuildArtifactsWorker
eat up a few GB of RAM, and because of that, Sidekiq memory killer keeps killing the processes.
You can see the datapoint from the following Grafana link.
https://dashboards.gitlab.net/d/sQ4GXgpik/memory-usage?orgId=1&from=now-6h&to=now
This is actually worse than it looks. To illustrate the problem, I describe how the system behaves below.
-
PipelineScheduleWorker
fires according to the cron schedule19 * * * *
. This is an hourly worker. -
PipelineScheduleWorker
creates an exclusive lock with Redis to prevent concurrent running. -
PipelineScheduleWorker
eats up RAM over 1GB. Due to our current RSS limit is 1GB, the process getsSIGTSTP
, and eventuallySIGTERM
bySidekiqMiddleware::MemoryKiller
. -
PipelineScheduleWorker
fires again (regardless of itssidekiq_options retry: false
), however, this process fails immediately, because the exclusive lock created at the 2. still remains. - In nutshell,
PipelineScheduleWorker
will never complete the job.
Proposal
- Reduce the memory consumption on the worker
- Remove exclusive lease
- Make
PipelineScheduleWorker
resilient (currently, multiple pipelines can be created in a short interval because ofrun_next_at
nature)
Edited by Shinya Maeda