Timeouts in kubernetes when enabling database load balancing in staging
On the webservice pods in the staging environment, consul is configured with service discovery for database load balancing:
# dig @consul-consul-dns.consul -p 53 db-replica.service.consul. ANY
;; ANSWER SECTION:
db-replica.service.consul. 0 IN A 10.224.29.104
db-replica.service.consul. 0 IN A 10.224.29.106
db-replica.service.consul. 0 IN A 10.224.29.102
db-replica.service.consul. 0 IN A 10.224.29.103
database.yml is configured like this:
production:
adapter: postgresql
encoding: unicode
database: gitlabhq_production
username: gitlab
password: "***redacted***"
host: "pgbouncer.int.gstg.gitlab.net"
port: 6432
pool: 1
prepared_statements: false
load_balancing:
discover:
nameserver: consul-consul-dns.consul
record: "db-replica.service.consul."
record_type: "SRV"
port: 53
use_tcp: true
And from the pod I can reach the replicas:
$ psql -vvv --port=6432 --host=10.224.29.106 -U gitlab gitlabhq_production
In the database_load_balancing.log
I am seeing the expected messages for hosts coming online:
{"severity":"INFO","time":"2020-08-18T14:50:11.759Z","correlation_id":null,"event":"host_online","message":"Host is online after replica status check","db_host":"10.224.29.103","db_port":6432}
{"severity":"INFO","time":"2020-08-18T14:51:12.862Z","correlation_id":null,"event":"host_online","message":"Host is online after replica status check","db_host":"10.224.29.106","db_port":6432}
{"severity":"INFO","time":"2020-08-18T14:54:15.535Z","correlation_id":null,"event":"host_online","message":"Host is online after replica status check","db_host":"10.224.29.102","db_port":6432}
{"severity":"INFO","time":"2020-08-18T14:54:48.813Z","correlation_id":null,"event":"host_online","message":"Host is online after replica status check","db_host":"10.224.29.102","db_port":6432}
{"severity":"INFO","time":"2020-08-18T14:54:48.851Z","correlation_id":null,"event":"host_online","message":"Host is online after replica status check","db_host":"10.224.29.104","db_port":6432}
But on every https request, we are seeing requests hang and the following only in production.log:
Rack::Timeout::RequestTimeoutException (Request ran for longer than 60000ms):
ee/lib/gitlab/database/load_balancing/host.rb:10:in `connection'
ee/lib/gitlab/database/load_balancing/host.rb:149:in `replication_lag_size'
ee/lib/gitlab/database/load_balancing/host.rb:127:in `data_is_recent_enough?'
ee/lib/gitlab/database/load_balancing/host.rb:106:in `replica_is_up_to_date?'
ee/lib/gitlab/database/load_balancing/host.rb:97:in `refresh_status'
ee/lib/gitlab/database/load_balancing/host.rb:72:in `online?'
ee/lib/gitlab/database/load_balancing/host_list.rb:62:in `block (2 levels) in next_host'
ee/lib/gitlab/database/load_balancing/host_list.rb:58:in `loop'
ee/lib/gitlab/database/load_balancing/host_list.rb:58:in `block in next_host'
ee/lib/gitlab/database/load_balancing/host_list.rb:53:in `synchronize'
ee/lib/gitlab/database/load_balancing/host_list.rb:53:in `next_host'
ee/lib/gitlab/database/load_balancing/host_list.rb:41:in `next'
ee/lib/gitlab/database/load_balancing/load_balancer.rb:93:in `host'
ee/lib/gitlab/database/load_balancing/load_balancer.rb:30:in `read'
ee/lib/gitlab/database/load_balancing/connection_proxy.rb:69:in `read_using_load_balancer'
ee/lib/gitlab/database/load_balancing/connection_proxy.rb:42:in `select_all'
app/models/concerns/cacheable_attributes.rb:19:in `current_without_cache'
app/models/concerns/cacheable_attributes.rb:55:in `current'
lib/gitlab/current_settings.rb:48:in `uncached_application_settings'
lib/gitlab/current_settings.rb:30:in `ensure_application_settings!'
lib/gitlab/current_settings.rb:7:in `block in current_application_settings'
lib/gitlab/safe_request_store.rb:12:in `fetch'
lib/gitlab/current_settings.rb:7:in `current_application_settings'
config/initializers/rack_attack.rb:6:in `settings'
config/initializers/rack_attack.rb:58:in `block in <class:Attack>'
lib/gitlab/middleware/read_only/controller.rb:51:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/same_site_cookies.rb:27:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:23:in `call'
config/initializers/fix_local_cache_middleware.rb:9:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:60:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'