Skip to content

Enormous workloads for ReactiveCachingWorker for Projects::MergeRequestsController#sast_reports

GitLab.com has been receiving alerts for Thread Contention from Sidekiq processes: https://gitlab.slack.com/archives/CD6HFD1L0/p1619686007402300

image

https://dashboards.gitlab.net/d/alerts-sat_ruby_thread_contention/alerts-ruby_thread_contention-saturation-detail?orgId=1&from=now-24h&to=now&panelId=2359646072&tz=UTC&var-environment=gprd&var-type=sidekiq&var-stage=main

Digging into these reports, the problem appears to be long running ReactiveCachingWorker jobs, invoked from Projects::MergeRequestsController#sast_reports

image

https://log.gprd.gitlab.net/goto/9275350a8eaf33d8cdf85396bff17507

  1. Jobs run for up to an hour (Sidekiq jobs should not run for more than 10m)
  2. These jobs are almost completely CPU bound, spending almost all their time on-thread, with relatively few calls to external services, Redis, Postgres etc
  3. They use a staggering amount of memory: up to 50GB per invocation

Ruby doesn't handle this type of multi-threading well. These jobs are noisy-neighbours and will slow down other jobs running in the same processes.

Implementation plan

  1. Revert changes from !54608 (diffs)
  2. Put them back under a feature flag
  3. Enable feature flag for gitlab-com
Edited by Michał Zając