Add service discovery for the DB load balancer
What does this MR do?
This adds a service discovery mechanism to the DB load balancer. Using service discovery a Unicorn process can automatically refresh the list of DB secondaries it has to use, without requiring a run of chef-client
.
How it works
When configuring the load balancer, instead of providing a list of hosts you provide it a DNS record to look up. When this DNS record is given we'll use service discovery. If the config/database.yml
includes any explicitly configured hosts, these will be overwritten by the service discovery mechanism.
The following configuration options are provided:
- nameserver: defaults to localhost
- port: the port of the nameserver, defaults to 8600 (Consul's default port for its DNS interface)
- interval: the time between checks
- record (required): the name of the DNS record to look up (e.g.
secondary.postgresql.service.consul
)
The service discovery mechanism does not use SRV records for port numbers, instead it reuses the port configured for the primary (mostly because it reuses existing code that has this limitation).
Replacing of hosts happens using a mutex, preventing other requests from using the existing hosts until they are replaced. Since this process is just a simple assignment of a few instance variables, this should happen very quickly. Requests may continue to use old hosts until the request finishes, as this greatly simplifies the code.
Why was this MR needed?
When adding secondaries, or during a failover, we need to refresh the list of database secondaries to use. Previously this required the following steps:
- Update the hosts in chef-repo
- Run
sudo chef-client
on all affected hosts - Run
sudo gitlab-ctl reconfigure
on all affected hosts - Run
sudo gitlab-ctl hup unicorn
on all affected hosts
With service discovery this is reduced to the following:
- Make sure Consul (or any other service that provides a DNS interface) is up-to-date
- Wait a little while for the system to take care of things automatically
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
Tests added for this feature/bug - Conform by the code review guidelines
-
Has been reviewed by a Backend maintainer -
Has been reviewed by a Database specialist
-
-
EE specific content should be in the top level /ee
folder -
Conform by the merge request performance guides -
Conform by the style guides -
If you have multiple commits, please combine them into a few logically organized commits by squashing them -
End-to-end tests pass ( package-qa
manual pipeline job)
What are the relevant issue numbers?
https://gitlab.com/gitlab-org/gitlab-ee/issues/2042
TODO
-
Write tests -
Write documentation -
Double check if everything works as expected when a process is handling requests -
Think about this for a day or two, to see if there's anything I have overlooked -
Talk with production (e.g. @northrup) to see if there are any additional requirements