Redis replicas lost link to master
Summary
The overflow of output buffer limit was making both Redis replicas lose synchronisation leading to a full resync which took 5 minutes.
Service(s) affected : ~"Service:Redis" Team attribution : Infrastructure Minutes downtime or degradation : 5
Timeline
2019-03-04
- 13:50 UTC - output buffer limit overrun - connection to both replicas lost. Full resync starting.
- 13:51 UTC -
RedisMasterLinkDownalert firing - 13:55 UTC - Both replicas in sync again
- 14:01 UTC - alerts resolved
- 14:52 UTC - output buffer limit overrun - connection to both replicas lost. Full resync starting.
- 14:58 UTC - Both replicas in sync again
- 15:02 UTC - @ahmadsherif doubling output buffers
knife ssh roles:gprd-base-db-redis-server-single '/opt/gitlab/embedded/bin/redis-cli -a $(sudo grep ^\masterauth /var/opt/gitlab/redis/redis.conf | cut -d\" -f2) config set client-output-buffer-limit "slave 8589934592 4294967296 180"'
Edited by Henri Philipps