SaaS Platforms' Sidekiq Roadmap
Improving Sidekiq's reliability has been a priority for groupscalability in the past, with efforts to introduce routing rules for a 1 queue per shard model, introducing urgency buckets for improved Sidekiq queueing and execution SLIs, providing the SREs with a skip-job functionality for better incident response.
Roadmap
FY25-Q1 Goals
Initiative | Purpose | Theme | Comment | |
---|---|---|---|---|
&1218 (closed) | Horizontally scale Sidekiq | We need to provide options to scale Sidekiq horizontally across multiple Redis shards, given that ServiceSidekiq is key to background jobs processing in GitLab, with CPU being a primary resource bottleneck | themeHorizontal Scalability | Top priority as it gives actual horizontal scalability |
&1120 | Measure client-side database transaction duration | Better track job's transaction duration and impact on patroni | themeScalability Advocacy and Facilitation | |
&1219 | Improve availability of Sidekiq to >99.95% | themeScalability Advocacy and Facilitation |
Previous efforts
Initiative | Purpose | Theme | |
---|---|---|---|
&944 (closed) | Stop using namespaces in sidekiq | Allow us to safely scale up Redis Cluster when future saturations are forecasted. | themeEliminating Toil |
&941 (closed) | Upgrading Sidekiq and Redis gems | GitLab Rails application is well positioned to handle Redis Cluster scale-ups ( &1105) and Sidekiq sharding ( #2541 (comment 1592705997)) | themeIncrease Platform Capacity |
&1004 (closed) | Defer sidekiq jobs using a feature flag | Allow on-call engineers to intervene and divert problematic jobs for retry in the future or drop them completely. | themeSaturation Response |
&700 (closed) | Make Sidekiq SLIs explorable in the error budget for stage groups dashboard | Granting stage groups greater ownership of scaling their background jobs | themeScalability Advocacy and Facilitation |
&431 (closed) | Move deduplication workload to Redis Cluster | To horizontally scale sidekiq-related workload as much as possible within the constraints of Sidekiq's incompatibility with Redis Cluster | themeHorizontal Scalability |
Future Roadmap
Initiative | Purpose | Theme | Comments | |
---|---|---|---|---|
Rate limiting | To prevent a small set of greedy consumers from starving the majority of resource | themeSaturation Response | Pending discussion | |
Tracking and preventing external calls within db transactions | To reduce client-side overheads on database transactions | themeScalability Advocacy and Facilitation | Continuation from gitlab-org&13143 | |
Attributing client-side database transaction SLIs to patroni and error budgets | Enable stage groups to take ownership of scaling their workloads | themeScalability Advocacy and Facilitation |
Edited by Sylvester Chin