GitLab application loadbalancing is not resilent to network disconnects to replicas
According to our docs on loadbalancing
https://docs.gitlab.com/ee/administration/database_load_balancing.html
Database load balancing improves the distribution of database workloads across multiple computing resources. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy.
Observed in gitlab-com/gl-infra/delivery#1141 (closed) where a networkpolicy rule disallowed outbound network to some of the database replicas in the discovery list.
The result of this was that on every request we were seeing severe degradation due to worker timeouts.
It looks like that we are not able to tolerate network connectivity lost to the replicas which I think is a fairly serious issue, this is pretty easy to reproduce. With a valid database, but an invalid replica:
database.yml:
production:
...
host: "10.33.0.6"
port: 5432
...
load_balancing: {"hosts":["www.example.com"]}
prepared_statements: false
Observe that we pass the /-/readiness
check and /-/liveness
:
Completed 200 OK in 3ms (Views: 0.3ms | ActiveRecord: 0.0ms | Elasticsearch: 0.0ms | Allocations: 289)
Started GET "/-/liveness" for 127.0.0.1 at 2020-08-19 08:26:22 +0000
Completed 200 OK in 1ms (Views: 0.2ms | ActiveRecord: 0.0ms | Elasticsearch: 0.0ms | Allocations: 210)
Started GET "/-/readiness" for 127.0.0.1 at 2020-08-19 08:26:25 +0000
but other requests will timeout with:
Rack::Timeout::RequestTimeoutException (Request ran for longer than 60000ms):
ee/lib/gitlab/database/load_balancing/host.rb:10:in `connection'
ee/lib/gitlab/database/load_balancing/load_balancer.rb:32:in `read'
ee/lib/gitlab/database/load_balancing/connection_proxy.rb:69:in `read_using_load_balancer'
ee/lib/gitlab/database/load_balancing/connection_proxy.rb:42:in `select_all'
app/models/concerns/cacheable_attributes.rb:19:in `current_without_cache'
app/models/concerns/cacheable_attributes.rb:55:in `current'
lib/gitlab/current_settings.rb:48:in `uncached_application_settings'
lib/gitlab/current_settings.rb:30:in `ensure_application_settings!'
lib/gitlab/current_settings.rb:7:in `block in current_application_settings'
lib/gitlab/safe_request_store.rb:12:in `fetch'
lib/gitlab/current_settings.rb:7:in `current_application_settings'
config/initializers/rack_attack.rb:6:in `settings'
config/initializers/rack_attack.rb:58:in `block in <class:Attack>'
lib/gitlab/middleware/read_only/controller.rb:51:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/same_site_cookies.rb:27:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:23:in `call'
config/initializers/fix_local_cache_middleware.rb:9:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:60:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'