Proposal: Limit the size of sidekiq jobs

Sidekiq jobs get serialized as JSON into redis.

We could put a limit in place inside a middleware for Sidekiq::Client, preventing jobs over a certain size from being scheduled and growing Sidekiq-Redis' memory usage. As well as causing slow redis calls to this instance

This could have prevented https://gitlab.com/gitlab-com/gl-infra/production/-/issues/3404

Proposal

  1. Figure out a reasonable limit by looking at what job sizes are currently being pushed into redis (gitlab-org/gitlab!53248 (merged)). <= We are here

  2. Implement a size limiter middleware and inject into Sidekiq-Client's middleware stack.

    This limiter should have two modes: track mode or limiting mode. In track mode, the limiter will report jobs that exceed the limit to sentry (not as an exception). In limiting mode, the limiter will raise an exception (which will also end up in Sentry) and avoid the job being scheduled.

    It should also support an allow list to explicitly exclude a worker class, using WorkerAttributes. When a worker exceeds the limit, but is marked to allow big jobs, an appropriate message is logged into Sidekiq.logger and the job is scheduled.

  3. For allowlisted jobs we'll create issues to figure out a way how we can improve the the way the workers get their data.

Exit Critera

This issue is considered completed when:

  • The limiter is enabled on Production in tracking mode
  • The initial limit is set higher

We will then investigate (on a separate issue) the possibility of using object storage for the payloads. We will also raise a new issue to discuss enforcing mode.

Edited by Rachel Nienaber