Sentinel was not running on redis-02 for an extended period of time
Noticed during the redis upgrade production#1727 (closed), sentinel was stopped so we had only two active sentinel nodes on the main cluster.
The last log messages:
2019-07-22_23:23:06.49349 29425:X 22 Jul 23:23:06.493 * +slave-reconf-inprog slave 10.217.2.103:6379 10.217.2.103 6379 @ gprd-redis 10.217.2.102 6379
2019-07-22_23:23:53.03079 29425:X 22 Jul 23:23:53.030 # +sdown master gprd-redis 10.217.2.102 6379
2019-07-22_23:24:05.50650 29425:X 22 Jul 23:24:05.506 # +failover-end-for-timeout master gprd-redis 10.217.2.102 6379
2019-07-22_23:24:05.50662 29425:X 22 Jul 23:24:05.506 # +failover-end master gprd-redis 10.217.2.102 6379
2019-07-22_23:24:05.50662 29425:X 22 Jul 23:24:05.506 * +slave-reconf-sent-be slave 10.217.2.101:6379 10.217.2.101 6379 @ gprd-redis 10.217.2.102 6379
2019-07-22_23:24:05.50663 29425:X 22 Jul 23:24:05.506 * +slave-reconf-sent-be slave 10.217.2.103:6379 10.217.2.103 6379 @ gprd-redis 10.217.2.102 6379
2019-07-22_23:24:05.50664 29425:X 22 Jul 23:24:05.506 # +switch-master gprd-redis 10.217.2.102 6379 10.217.2.101 6379
2019-07-22_23:24:05.50679 29425:X 22 Jul 23:24:05.506 * +slave slave 10.217.2.103:6379 10.217.2.103 6379 @ gprd-redis 10.217.2.101 6379
2019-07-22_23:24:05.50680 29425:X 22 Jul 23:24:05.506 * +slave slave 10.217.2.102:6379 10.217.2.102 6379 @ gprd-redis 10.217.2.101 6379
2019-07-22_23:24:13.96611 29425:X 22 Jul 23:24:13.965 # +tilt #tilt mode entered
2019-07-22_23:24:44.00484 29425:X 22 Jul 23:24:44.004 # -tilt #tilt mode exited
2019-07-22_23:28:03.12687 29425:X 22 Jul 23:28:03.126 # +sdown slave 10.217.2.102:6379 10.217.2.102 6379 @ gprd-redis 10.217.2.101 6379
2019-07-22_23:28:39.54927 29425:signal-handler (1563838119) Received SIGTERM scheduling shutdown...
2019-07-22_23:28:39.56654 29425:X 22 Jul 23:28:39.566 # User requested shutdown...
2019-07-22_23:28:39.56660 29425:X 22 Jul 23:28:39.566 # Sentinel is now ready to exit, bye bye...
so it appears to be a normal shutdown.
this timeframe corresponds to around the time of the last upgrade of redis on production:
dpkg.log.8.gz:2019-07-22 23:33:17 status installed gitlab-ee:amd64 11.11.5-ee.0
-
Add monitoring/alerting for sentinel for the different clusters -
Consider adding a note to change issues so this is covered in a precheck