2019-06-27 Redis flapping/down

Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.


Summary

A brief summary of what happened. Try to make it as executive-friendly as possible.

TBD

Service(s) affected : Team attribution : Minutes downtime or degradation :

Timeline

2019-06-27

  • 14:00 UTC - redis alerts started coming into #production like: Triggered #11539: Firing 1 - Connection of Redis replicas to the master is flapping! Look at redis-cache-03-db-gprd.c.gitlab-production.internal:9121
  • 14:09 UTC - looking at redis slowlog per notes below
  • 14:25 UTC - another failover from redis-cache-03 to redis-cache-02
  • 15:00 UTC - with config set below, the slaves were able to resync and join the new master and the cluster again appears stable.
Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading