Redis replicas lost link to master

Summary

The overflow of output buffer limit was making both Redis replicas lose synchronisation leading to a full resync which took 5 minutes.

Service(s) affected : ~"Service:Redis" Team attribution : Infrastructure Minutes downtime or degradation : 5

Timeline

2019-03-04

  • 13:50 UTC - output buffer limit overrun - connection to both replicas lost. Full resync starting.
  • 13:51 UTC - RedisMasterLinkDown alert firing
  • 13:55 UTC - Both replicas in sync again
  • 14:01 UTC - alerts resolved
  • 14:52 UTC - output buffer limit overrun - connection to both replicas lost. Full resync starting.
  • 14:58 UTC - Both replicas in sync again
  • 15:02 UTC - @ahmadsherif doubling output buffers
    • knife ssh roles:gprd-base-db-redis-server-single '/opt/gitlab/embedded/bin/redis-cli -a $(sudo grep ^\masterauth /var/opt/gitlab/redis/redis.conf | cut -d\" -f2) config set client-output-buffer-limit "slave 8589934592 4294967296 180"'
Edited Mar 04, 2019 by Henri Philipps
Assignee Loading
Time tracking Loading