SaaS Platforms' Redis Roadmap
Scaling Redis has been a near-constant priority for groupscalability over the last year, focussing on performance improvements &841 (closed), new Redis partitions, the option to deploy Redis on Kubernetes &619 (closed), simplifying configuration &886 (closed), adding partitions: &857 (closed), and introducing Redis Cluster &823 (closed).
To scale Redis indefinitely, there are two primary objectives that complement each other:
OKR
Scale up a Redis cluster deployment -Details
In the past few years, we have addressed Redis CPU saturation through an iterative series of efficiency improvements and by splitting portions of the workload into separate functional partitions. That approach has served us well, but it carries a high cost of effort, a long lead time to deliver results, and does not scale indefinitely. As an example, partitioning repository-cache from redis-cache was a collaborative effort between 2-3 engineers over the space of 6 weeks. &860 (closed)Redis Cluster will provide a predictable and much lower-friction means of addressing both CPU and memory saturation by allowing us to scale up the number of Redis instances as the workload grows. Once implemented, it will natively provide horizontal scaling and online resharding without further client-side changes.
Being operationally proficient with scaling our Redis Clusters would help us readily respond quickly to saturation forecasts in the future. We should reliably be able to scale out ServiceRedisCache before we can migrate our non-Redis Cluster workloads such as ServiceRedisRepositoryCache back to it.
Goal: Implement Redis Cluster as a deployment strategy.
OKR
Horizontally scale Sidekiq -Details
We need to provide options to scale Sidekiq horizontally across multiple Redis shards, given that ServiceSidekiq is key to background jobs processing in GitLab, with CPU being a primary resource bottleneckGoal: Provide options to scale Sidekiq horizontally across multiple Redis shards.
Roadmap
FY25-Q1/Q2 Goals
Initiative | Purpose | Theme | Comment | |
---|---|---|---|---|
&1105 (closed) | Prove the resharding model by scaling out and online re-sharding of Redis Cluster | Prevent Saturation | themeHorizontal Scalability | Top priority as it gives actual horizontal scalability |
&1236 (closed) | Migrating redis-repository-cache to a Redis Cluster | Prevent Saturation in ServiceRedisRepositoryCache. | themeSaturation Response | ServiceRedisClusterRepoCache is now serving repository-cache related traffic (dashboard). The repository cache workload is now horizontally scalable. |
Previous efforts
Details
Initiative | Purpose | Theme | |
---|---|---|---|
&944 (closed) | Stop using namespaces in sidekiq | Allow us to safely scale up Redis Cluster when future saturations are forecasted. | |
&941 (closed) | Upgrading Sidekiq and Redis gems | GitLab Rails application is well positioned to handle Redis Cluster scale-ups ( &1105 (closed)) and Sidekiq sharding ( #2541 (comment 1592705997)) | |
&1055 (closed) | Cluster for redis-sharedstate
|
Horizontal scaling for sharedstate
|
themeHorizontal Scalability Objective: Prevent Saturation |
&1066 (closed) | Functional Partitioning for pub/sub in Redis | Gain headroom in sharedstate
|
themeSaturation Response Objective: Prevent Saturation |
&1094 (closed) | Migrate Exclusive Lease keys to redis-cluster-shared-state |
Future Roadmap
Initiative | Purpose | Theme | Comments | |
---|---|---|---|---|
#2466 | Cluster resharding strategy | Document criteria for when scaling up a Redis Cluster is required for an instance. |
themeHorizontal Scalability Objective: Prevent Saturation |
- |
- | Deployment Guidelines | Document considerations and recommendations for deploying a new Redis instance. |
themeScalability Advocacy and Facilitation Objective: Low-touch Configuration |
Good to be picked after Horizontally scale Sidekiq since we will have redis-cluster and redis cookbooks in place |
Move Cluster to GKE | Currently not an option due to concerns on the performance overheads #2893 (comment 1806681120). To be discussed in #3560 | |||
- | Auto reshard (scale-up) Clusters | Build tools to allow clusters to grow automatically when additional capacity is required |
themeScalability Advocacy and Facilitation Objective: Low-touch Configuration |
- |
Edited by Kennedy Wanyangu