Implement Sidekiq throttling middleware in gitlab rails
This issue aims to implement a throttling middleware to adjust concurrency limit based on various database indicators. We'll use concurrency limit middleware as the mechanism to throttle number of concurrent jobs a worker is allowed to perform.
Indicators:
- Client-side DB duration per minute
- DB-side active connections
Throttling Effect:
- If 1st indicator is violated, worker is subjected to a "soft throttle"
- If both indicator is violated, worker is subjected to a "hard throttle"
Glossary
- Throttle - Decrease current concurrency limit in Redis.
SidekiqMiddleware::ConcurrencyLimit::Middleware
is responsible for deferring the job into a queue and releasing it back. - Soft Throttle -
current_limit * 0.8
- Hard Throttle -
current_limit * 0.5
- Recovery - greater of
current_limit + 1
orcurrent_limit * 1.1
, when the current limit is below the max limit - (numbers above can change)
Prerequisites
- Track dynamic concurrency limit in Redis gitlab-com/gl-infra/data-access/durability/team#146 (closed)
- Configure starting max concurrency limit based on urgency and sidekiq shard fleet's max thread gitlab-com/gl-infra/data-access/durability/team#215 (closed) (ref)
- eg a low urgency worker in catchall would be allowed to use 20% of the fleet (which is 2160 jobs, currently).
- This allows the numbers to scale with the fleet size, ie for Dedicated and self-managed.
- Individual workers could adjust this percentage, and we could allow configuration to override the count altogether.
Tasks
- Implement the throttling middleware based on the 2 indicators and effect above
- Implement the recovery background thread to restore current limit back to its max limit
Previous description
This issue is a placeholder to contain more focused discussions from https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3775 w.r.t throttling mechanism.
Looking at https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3775#note_2089502716, this is currently the top contender for how the throttling mechanism will buffer and release jobs using Gitlab::SidekiqMiddleware::ConcurrencyLimit
.
Other aspects to consider:
- Rate of throttling and recovery (throttling strategy)
- Manual overrides for EOC intervention
- Interaction with existing EOC tools like job deferring (https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/3807)
References
Edited by Marco Gregorius