2019-06-27 Redis flapping/down
Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.
Summary
A brief summary of what happened. Try to make it as executive-friendly as possible.
TBD
Service(s) affected : Team attribution : Minutes downtime or degradation :
Timeline
2019-06-27
- 14:00 UTC - redis alerts started coming into #production
like: Triggered #11539: Firing 1 - Connection of Redis replicas to the master is flapping! Look at
redis-cache-03-db-gprd.c.gitlab-production.internal:9121 - 14:09 UTC - looking at redis slowlog per notes below
- 14:25 UTC - another failover from redis-cache-03 to redis-cache-02
- 15:00 UTC - with config set below, the slaves were able to resync and join the new master and the cluster again appears stable.
Edited by 🤖 GitLab Bot 🤖