Minor changes in sentinel.conf by Sentinel causes restarts in Sentinel
On GSTG today, we noticed that Sentinel was restarted. If you look at the reconfigure logs:
Staging REDIS_CHECKCMD_ERROR root@redis-01-db-gstg.c.gitlab-staging-1.internal:/var/log/gitlab/reconfigure# vi 1533224357.log
# Logfile created on 2018-08-02 15:39:17 +0000 by logger.rb/56815
[2018-08-02T15:39:17+00:00] INFO: Started chef-zero at chefzero://localhost:1 with repository at /opt/gitlab/embedded
One version per cookbook
[2018-08-02T15:39:17+00:00] INFO: *** Chef 13.6.4 ***
[2018-08-02T15:39:17+00:00] INFO: Platform: x86_64-linux
[2018-08-02T15:39:17+00:00] INFO: Chef-client pid: 1304
[2018-08-02T15:39:17+00:00] INFO: The plugin path /etc/chef/ohai/plugins does not exist. Skipping...
[2018-08-02T15:39:18+00:00] INFO: Setting the run_list to ["recipe[gitlab-ee]"] from CLI options
[2018-08-02T15:39:18+00:00] INFO: Run List is [recipe[gitlab-ee]]
[2018-08-02T15:39:18+00:00] INFO: Run List expands to [gitlab-ee]
[2018-08-02T15:39:18+00:00] INFO: Starting Chef Run for redis-01-db-gstg.c.gitlab-staging-1.internal
[2018-08-02T15:39:18+00:00] INFO: Running start handlers
[2018-08-02T15:39:18+00:00] INFO: Start handlers complete.
[2018-08-02T15:39:19+00:00] INFO: Loading cookbooks [gitlab-ee@0.0.1, package@0.1.0, gitlab@0.0.1, consul@0.0.0, repmgr@0.1.0, runit@0.14.2, postgresql@0.1.0, registry@0.1.0, mattermost@0.1.0, gitaly@0.1.0, letsencrypt@0.1.0, nginx@0.1.
0, acme@3.1.0, crond@0.1.0, compat_resource@12.19.0]
[2018-08-02T15:39:25+00:00] WARN: gitlab-rails 'redis_host' will be ignored as sentinel is defined.
[2018-08-02T15:39:25+00:00] WARN: Selected systemd because systemctl shows .mount units
[2018-08-02T15:39:25+00:00] INFO: The plugin path /etc/chef/ohai/plugins does not exist. Skipping...
[2018-08-02T15:39:25+00:00] INFO: template[/var/opt/gitlab/redis/redis.conf] backed up to /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/redis/redis.conf.chef-20180802153925.998785
[2018-08-02T15:39:25+00:00] INFO: template[/var/opt/gitlab/redis/redis.conf] removed backup at /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/redis/redis.conf.chef-20180731180823.608603
[2018-08-02T15:39:26+00:00] INFO: template[/var/opt/gitlab/redis/redis.conf] updated file contents /var/opt/gitlab/redis/redis.conf
[2018-08-02T15:39:26+00:00] INFO: template[/var/opt/gitlab/redis/redis.conf] sending restart action to service[redis] (immediate)
[2018-08-02T15:39:27+00:00] INFO: service[redis] restarted
[2018-08-02T15:39:27+00:00] WARN: only_if block for template[/var/opt/gitlab/sentinel/sentinel.conf] returned "/var/opt/gitlab/sentinel/sentinel.conf", did you mean to run a command? If so use 'only_if "/var/opt/gitlab/sentinel/sentinel
.conf"' in your code.
[2018-08-02T15:39:27+00:00] INFO: template[/var/opt/gitlab/sentinel/sentinel.conf] backed up to /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/sentinel/sentinel.conf.chef-20180802153927.407600
[2018-08-02T15:39:27+00:00] INFO: template[/var/opt/gitlab/sentinel/sentinel.conf] removed backup at /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/sentinel/sentinel.conf.chef-20180731214102.769393
[2018-08-02T15:39:27+00:00] INFO: template[/var/opt/gitlab/sentinel/sentinel.conf] updated file contents /var/opt/gitlab/sentinel/sentinel.conf
[2018-08-02T15:39:27+00:00] INFO: template[/var/opt/gitlab/sentinel/sentinel.conf] sending restart action to service[sentinel] (immediate)
[2018-08-02T15:39:27+00:00] INFO: service[sentinel] restarted
[2018-08-02T15:39:27+00:00] INFO: Chef Run complete in 9.277243458 seconds
[2018-08-02T15:39:27+00:00] INFO: Running report handlers
Based on this log, you can see the diff:
Staging REDIS_CHECKCMD_ERROR root@redis-01-db-gstg.c.gitlab-staging-1.internal:/var/log/gitlab/reconfigure# sudo diff /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/sentinel/sentinel.conf.chef-20180802153927.407600 /var/opt/gitlab/sentinel/sentinel.conf
205,207c205,206
< sentinel config-epoch gstg-redis 86
< sentinel leader-epoch gstg-redis 86
< sentinel known-slave gstg-redis 10.224.7.103 6379
---
> sentinel config-epoch gstg-redis 88
> sentinel leader-epoch gstg-redis 88
209c208
< sentinel known-sentinel gstg-redis 10.224.7.102 26379 c6c70b3130af78431deb724a2f056bebe3eb91f5
---
> sentinel known-slave gstg-redis 10.224.7.103 6379
211c210,211
< sentinel current-epoch 86
---
> sentinel known-sentinel gstg-redis 10.224.7.102 26379 c6c70b3130af78431deb724a2f056bebe3eb91f5
> sentinel current-epoch 88
For clarity, you can see the files themselves.
Previous
sentinel config-epoch gstg-redis 86
sentinel leader-epoch gstg-redis 86
sentinel known-slave gstg-redis 10.224.7.103 6379
sentinel known-slave gstg-redis 10.224.7.102 6379
sentinel known-sentinel gstg-redis 10.224.7.102 26379 c6c70b3130af78431deb724a2f056bebe3eb91f5
sentinel known-sentinel gstg-redis 10.224.7.103 26379 34e0dd1665774c12b2a883110d399d8bd72027aa
sentinel current-epoch 86
Current
sentinel config-epoch gstg-redis 88
sentinel leader-epoch gstg-redis 88
sentinel known-slave gstg-redis 10.224.7.102 6379
sentinel known-slave gstg-redis 10.224.7.103 6379
sentinel known-sentinel gstg-redis 10.224.7.103 26379 34e0dd1665774c12b2a883110d399d8bd72027aa
sentinel known-sentinel gstg-redis 10.224.7.102 26379 c6c70b3130af78431deb724a2f056bebe3eb91f5
sentinel current-epoch 88
It seems that the changes here are minor (e.g. current-epoch went from 86 -> 88), line ordering changes, etc. This seems like a fundamental problem with tying sentinel.conf
to the restart step. Do we only need to bootstrap this file once, and let Sentinel manage it later?