Use service discovery for DB load balancing
Production Change - Criticality 3 C3
Services Impacted
- GitLab Rails
- Sidekiq
- Postgres
Change Team Members
Change Severity
C4
Buddy check or tested in staging
Tested on staging
Schedule of the change
March 21st, 2019 - 1 AM UTC
Duration of the change
30-60 minutes
Detailed steps for the change
The change is encapsulated in an Ansible playbook (ansible-migrations!1 (merged)) and is detailed (collapsed) at the bottom.
Relevant chef changes:
- https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/473/diffs
- https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/474/diffs
Command execution would be as follows:
workstation $ ssh -A bastion-01-inf-gprd.c.gitlab-production.internal
bastion $ git clone git@gitlab.com:gitlab-com/gl-infra/ansible-migrations.git
bastion $ cd ansible-migrations
bastion $ consul catalog nodes | grep -E '^(api|git|web|sidekiq)' | awk '{ print $3 }' >> production-633/inventory.txt
bastion $ export OPS_API_TOKEN=<token>
bastion $ export MIGRATION_ENV=gprd
bastion $ export ANSIBLE_HOST_KEY_CHECKING=False
bastion $ ~/.local/bin/ansible-playbook -i production-633/inventory.txt -M ./modules/ -e @production-633/variables.yml production-633/playbook.yml
Steps execution
-
-
Pre-conditions: N/A -
Step: Stop chef-clienton the Patroni clusterknife ssh roles:gprd-base-db-patroni 'sudo service chef-client stop'
-
Post-execution validation: -
knife ssh roles:gprd-base-db-patroni 'ps aux | grep chef-client | grep -v grep' | wc -l- It should print "0"
-
-
Rollback: knife ssh roles:gprd-base-db-patroni 'sudo service chef-client start'
-
-
-
Pre-conditions: N/A -
Step: Switch on maintenance mode in the Patroni cluster knife ssh roles:gprd-base-db-patroni 'gitlab-patronictl pause --wait'
-
Post-execution validation: -
knife ssh roles:gprd-base-db-patroni 'gitlab-patronictl list 2>/dev/null | grep "Maintenance mode: on" | grep -v grep' | wc -l- It should print "6"
-
-
Rollback: knife ssh roles:gprd-base-db-patroni 'gitlab-patronictl resume --wait'
-
-
-
Pre-conditions: N/A -
Step: Merge and apply https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/474 -
Post-execution validation: N/A -
Rollback: Revert https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/474, merge and apply
-
-
-
Pre-conditions: -
ssh patroni-06-db-gprd.c.gitlab-production.internal 'gitlab-patronictl list | grep $(hostname)' | grep Leader | wc -l- It should print "0"
-
-
Step: Run chef-clienton a single nodessh patroni-06-db-gprd.c.gitlab-production.internal 'sudo chef-client && sudo systemctl restart consul'
-
Post-execution validation: -
ssh patroni-06-db-gprd.c.gitlab-production.internal 'dig @localhost -p 8600 replica.patroni.service.consul. | grep 10.220.16.106 | wc -l'- It should print "1"
-
-
Rollback: N/A
-
-
-
Pre-conditions: N/A -
Step: Rollout the changes to the rest of the cluster, one by one knife ssh -C1 'roles:gprd-base-db-patroni -name:patroni-06-db-gprd.c.gitlab-production.internal' 'sudo chef-client && sudo systemctl restart consul'
-
Post-execution validation: -
ssh patroni-06-db-gprd.c.gitlab-production.internal 'dig @localhost -p 8600 +short replica.patroni.service.consul. | wc -l'- It should print "5"
-
ssh patroni-06-db-gprd.c.gitlab-production.internal 'dig @localhost -p 8600 +short master.patroni.service.consul. | wc -l'- It should print "1"
-
-
Rollback: N/A
-
-
-
Pre-conditions: N/A -
Step: Merge and apply https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/473 -
Post-execution validation: N/A -
Rollback: Revert https://ops.gitlab.net/gitlab-cookbooks/chef-repo/merge_requests/473, merge and apply
-
-
-
Pre-conditions: N/A -
Step: knife ssh roles:gprd-base 'sudo chef-client' -
Post-execution validation: -
ssh web-01-sv-gprd.c.gitlab-production.internal "sudo gitlab-rails r 'puts Gitlab::Database::LoadBalancing.proxy.load_balancer.instance_eval { @host_list }.hosts.map(&:host).count'"- It should print "5"
-
-
Rollback: N/A
-
Edited by Ahmad Sherif