Database Load Balancing is not working for host-based load balancing
With DB load balancing enabled with hosts:
, all page loads errors with :
NoMethodError - undefined method `[]' for nil:NilClass:
lib/gitlab/database/load_balancing/host.rb:10:in `enable_query_cache!'
lib/gitlab/database/load_balancing/load_balancer.rb:236:in `ensure_caching!'
lib/gitlab/database/load_balancing/load_balancer.rb:35:in `read'
lib/gitlab/database/load_balancing/connection_proxy.rb:93:in `read_using_load_balancer'
lib/gitlab/database/load_balancing/connection_proxy.rb:46:in `select_all'
lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:21:in `call'
config/initializers/fix_local_cache_middleware.rb:11:in `call'
lib/gitlab/middleware/static.rb:11:in `call'
lib/gitlab/webpack/dev_server_middleware.rb:34:in `perform_request'
lib/gitlab/middleware/rack_multipart_tempfile_factory.rb:19:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:76:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'
dab6ec44 works. Current master b9593cda768f29da0464a55f2f04f5d20239808a
does not. Suspect !59107 (merged) is related, f924273d864f62ae8b3fab9384cd345e679da569
fails too
Impact on GitLab.com (production)
Strangely this is working on production, as f924273d is already deployed. See #332913 (comment 594961062)
Cause
So what is happening:
-
DB Load Balancing was moved to Core, this is all OK
-
We bumped to Rails 6.1 which includes a new
after_fork
to discard ALL RailsActiveRecord::ConnectionAdapters::ConnectionPool
objects.- Note, it's very cool how they use
WeakMap
to find all such objects.
- Note, it's very cool how they use
-
Puma forks new workers
-
Because
ActiveRecord::ConnectionAdapters::ConnectionPool#discard!
is called, our@pool
object inside each LBhost
is broken. As https://github.com/rails/rails/blob/14bca259754904403b7007d450d2df3af6a36013/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L259-L264 notes:Any further interaction with the pool (except #spec and #schema_cache) is undefined.
Possible fix
Call Gitlab::Database::LoadBalancing.configure_proxy
inside Gitlab::Cluster::LifecycleEvents.on_worker_start
(which itself is called from Puma's on_worker_boot
so that we get a fresh set of hosts ?