Capacity Planning: redis Service, redis_primary_cpu resource

CPU on our primary Redis instance is trending up quite steeply

image

From the Tamland report at https://gitlab-com.gitlab.io/gl-infra/tamland/saturation.html


This is a placeholder for now, but we need to start thinking about next steps for scaling the Redis service (along with Redis-Sidekiq: #590 (closed), which has its own issue)

Possible actions include:

  1. Watch and wait to see if things flatten out
  2. Investigate the sources of traffic and optimize the application
  3. Optimizations to the Redis instance, including upgrading to Redis 6 with --io-threads
  4. Vertical scaling, if GCP offer more powerful instance types (Afaik, I don't think they do have any more powerful Intel machines, and we can't experiment with things like AMD EPYC processors yet, as they're in Beta and don't run in us-east1 yet should we investigate N2D AMD EPYC nodes?)
  5. Break off more bits from the Redis instance.
    1. Rack sessions seem like a possible option and would also unblock us from Redis Cluster
  6. Address any Redis Cluster key violations in preparation for Redis Cluster
  7. Start preparing to Redis Cluster
Edited by Andrew Newdigate