Database Load Balancing is not working for host-based load balancing
With DB load balancing enabled with hosts:, all page loads errors with :
NoMethodError - undefined method `[]' for nil:NilClass:
lib/gitlab/database/load_balancing/host.rb:10:in `enable_query_cache!'
lib/gitlab/database/load_balancing/load_balancer.rb:236:in `ensure_caching!'
lib/gitlab/database/load_balancing/load_balancer.rb:35:in `read'
lib/gitlab/database/load_balancing/connection_proxy.rb:93:in `read_using_load_balancer'
lib/gitlab/database/load_balancing/connection_proxy.rb:46:in `select_all'
lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:21:in `call'
config/initializers/fix_local_cache_middleware.rb:11:in `call'
lib/gitlab/middleware/static.rb:11:in `call'
lib/gitlab/webpack/dev_server_middleware.rb:34:in `perform_request'
lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:76:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'
dab6ec44 works. Current master b9593cda768f29da0464a55f2f04f5d20239808a does not. Suspect !59107 (merged) is related, f924273d864f62ae8b3fab9384cd345e679da569 fails too
Impact on GitLab.com (production)
Strangely this is working on production, as f924273d is already deployed. See #332913 (comment 594961062)
Cause
So what is happening:
-
DB Load Balancing was moved to Core, this is all OK
-
We bumped to Rails 6.1 which includes a new
after_forkto discard ALL RailsActiveRecord::ConnectionAdapters::ConnectionPoolobjects.- Note, it's very cool how they use
WeakMapto find all such objects.
- Note, it's very cool how they use
-
Puma forks new workers
-
Because
ActiveRecord::ConnectionAdapters::ConnectionPool#discard!is called, our@poolobject inside each LBhostis broken. As https://github.com/rails/rails/blob/14bca259754904403b7007d450d2df3af6a36013/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L259-L264 notes:Any further interaction with the pool (except #spec and #schema_cache) is undefined.
Possible fix
Call Gitlab::Database::LoadBalancing.configure_proxy inside Gitlab::Cluster::LifecycleEvents.on_worker_start (which itself is called from Puma's on_worker_boot so that we get a fresh set of hosts ?