Skip to content

Support load balancing of multiple database clusters

Yorick Peterse requested to merge load-balancer-multiple-databases into master

What does this MR do?

This adds support for using the database load balancer with multiple databases. Load balancing is applied to two classes:

  • ActiveRecord::Base
  • Ci::CiDatabaseRecord

Each class has its own load balancer, configuration, service discovery, etc. Load balancing for the CI class is only enabled when a CI configuration exists, as it can reuse the main load balancer when there's no dedicated CI database.

Sticking technically supports multiple databases, but in practise we apply the same sticking rules to all databases. This is due to how LoadBalancing::Session is used: there is only one instance per request/Sidekiq job, and it's not aware of what database connections did what. This means that a write to database A will result in GitLab sticking to the primaries of all databases. The choice for this is simple: it requires fewer code changes, and allows us to introduce multiple database support in smaller increments.

One change we made to sticking is to turn the Sticking module into a class, and attach an instance to every base module that has its own load balancer. This makes it easier to apply sticking on a per-database level in the future, without having to type Gitlab::Database::LoadBalancing::Sticking... every time.

Sticking also supports reading and writing of data using the old Redis key names. This ensures sticking continues to work during a deployment, as during this window we'll run two different versions in production. Once the code has been deployed to GitLab.com and has been confirmed to work, we'll remove support for reading/writing the old keys.

Sidekiq also supports load balancing multiple databases. If a load balancer/database doesn't have any WAL data in the Sidekiq job, we treat the database as being in sync. This way we can support Sidekiq jobs using both the old and new load balancing data.

See #331776 (closed) for more details.

TODO

  • Test with a regular replica
  • Test with a delayed replica (e.g. 5 minutes)
  • Test Sidekiq load balancing
  • Get all tests to pass
  • Take into account the fixes needed for #341584 (closed)
  • Test changes in the sharding cluster?

For testing we might be able to take advantage of https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/14191.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Yorick Peterse

Merge request reports