You need to sign in or sign up before continuing.

Write tutorial - Tour of Redis at GitLab.com

This is an overview style tutorial.

Tour the Redis clusters, their distinct roles as shared caching and queuing datastores, their high availability mechanisms, and scaling constraints.

Include:

Briefly define what Redis is: an in-memory key-value store, run as a single-threaded process. Caching and queue storage are two of its common use-cases.
Describe the distinct purpose of the 3 Redis clusters: redis-persistent, redis-cache, redis-sidekiq
Describe the durability settings for each cluster: RDB backups enabled? Replication to secondaries is async but near-realtime.
Describe HA failover: Redis-Sentinel as a sidecar out-of-band agent for managing failovers. Don't rewrite the Sentinel docs. Just summarize its basic behaviors in terms of gitlab clients:
- gitlab-rails puma workers query to sentinel to find the current redis primary node.
- Sentinel health-checks redis nodes, and it initiates failover and announces new primary when its health checks fail.
- Failover is lossy. Expect very recent redis writes to be lost (100 milliseconds?), and expect a burst of errors from clients until they reconverge on the new primary (1-10 seconds?). There may also be subtle residual effects, such as a sidekiq job that had been dequeued immediately before the failover being re-run after the failover, or a job that had been enqueued immediately before the failover disappearing without ever being run.
Describe scaling constraints:
- Redis runs as a single-threaded process, so it executes tasks serially. Thus its throughput is constrained by its latency. (Redis ops are by design meant to be very low latency, typically operating on a single key.)
- This single-threaded design choice means it is not prone to lock contention or other inter-thread or inter-process resource contention. However, it also cannot scale beyond 1 CPU and is prone to queuing delays, such that all other pending client operations are delayed when any one operation is particularly slow.
- At GitLab.com's scale, we have several times reached this scaling constraint, prompting us to improve application behavior and explore several options for horizontally scaling this service tier.

Edited Sep 17, 2022 by 🤖 GitLab Bot 🤖