Consul services getting deleted incorrectly during reconfigure
After noticing a test environment would start to throw errors randomly recently we've found that the issue seemed to be due to to the database leader not being updated.
Digging further it was found that the Consul agents on the Postgres nodes were missing their postgresql
service config, the config that configures Consul to poll Postgres. After running reconfigure though the files were regenerated.
However after running reconfigure again the files were again deleted:
Recipe: consul::enable_service_postgresql
* file[/var/opt/gitlab/consul/config.d/postgresql_service.json] action create (up to date)
Recipe: consul::watchers
* file[/var/opt/gitlab/consul/config.d/node-exporter-service.json] action delete[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/node-exporter-service.json] backed up to /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/node-exporter-service.json.chef-20230330155408.758032
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/node-exporter-service.json] removed backup at /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/node-exporter-service.json.chef-20230329182640.940935
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/node-exporter-service.json] deleted file at /var/opt/gitlab/consul/config.d/node-exporter-service.json
- delete file /var/opt/gitlab/consul/config.d/node-exporter-service.json
* file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] action delete[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] backed up to /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/postgres-exporter-service.json.chef-20230330155408.780842
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] removed backup at /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/postgres-exporter-service.json.chef-20230329182640.962491
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgres-exporter-service.json] deleted file at /var/opt/gitlab/consul/config.d/postgres-exporter-service.json
- delete file /var/opt/gitlab/consul/config.d/postgres-exporter-service.json
* file[/var/opt/gitlab/consul/config.d/postgresql_service.json] action delete[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgresql_service.json] backed up to /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/postgresql_service.json.chef-20230330155408.901601
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgresql_service.json] removed backup at /opt/gitlab/embedded/cookbooks/cache/backup/var/opt/gitlab/consul/config.d/postgresql_service.json.chef-20230329182640.984329
[2023-03-30T15:54:08+00:00] INFO: file[/var/opt/gitlab/consul/config.d/postgresql_service.json] deleted file at /var/opt/gitlab/consul/config.d/postgresql_service.json
- delete file /var/opt/gitlab/consul/config.d/postgresql_service.json
Seeing that the above code was being fired under watcher
we found this recent change and suspect the Watcher cleanup code recently added is also cleaning up Service config incorrectly.