Load Balancing in front of Omnibus instances
Summary
When placing a load balancer in front of multiple Omnibus machines/VMs, incoming connections are too eagerly being accepted by nginx even if the instance is too busy to process new requests. This is especially annoying if the cluster has load not distributed evenly (because some requests take much longer than others).
Steps to reproduce
Issue requests in this manner:
What is the current bug behavior?
What happens is that in this setup, because nginx is eagerly accepting connections, a request is queued on a busy instance, even if there are other instances with capacity to immediately process the request. Ordinarily this isn't a problem; users only see that their requests are slow. In exceptional situations where GitLab takes more than maybe a couple of seconds (because of large repos or MRs), the frontmost load balancer sees that the request has taken too long and the request times out.
In the worst possible scenario, the request being queued is a health check probe. If the probe times out for long enough, the instance is killed even if the instance is just too busy.
What is the expected correct behavior?
Requests should be queued at the front most load balancer (ELB/HAProxy/whatever), so that it can make more intelligent decisions on which instance should be handling a request, avoiding instances which are too overloaded.
Relevant logs
None.
Details of package version
We're using gitlab-ee 11.6.3.
Provide the package version installation details
||/ Name Version Architecture Description +++-=================================-=====================-=====================-======================================================================= un gitlab-ce (no description available) ii gitlab-ee 11.6.3-ee.0 amd64 GitLab Enterprise Edition (including NGINX, Postgres, Redis)
Environment details
- Operating System:
Ubuntu 16.04
- Installation Target, remove incorrect values:
- VM: AWS
- In theory this would also affect other kinds of installs.
- Installation Type, remove incorrect values:
- Should be regardless of installation type.
- Is there any other software running on the machine: Nothing exceptional.
- Is this a single or multiple node installation? Multiple nodes.
- Resources
- CPU: In our case, c5.xlarge, with multiple nodes (4vCPUs)
- Memory total: 8GB ram per node
Configuration details
nginx configurations:
nginx['listen_port'] = 80
nginx['listen_https'] = false
nginx['proxy_set_headers'] = {
'Host' => '$http_host_with_default',
'X-Forwarded-For' => '$proxy_add_x_forwarded_for',
'X-Forwarded-Proto' => '$http_x_forwarded_proto',
'X-Forwarded-Ssl' => 'on',
'Upgrade' => '$http_upgrade',
'Connection' => '$connection_upgrade'
}
nginx['real_ip_trusted_addresses'] = ['ip-range', 'ip-range'] # our VPC subnet ranges
nginx['real_ip_recursive'] = 'on'
nginx['custom_gitlab_server_config'] = <<-NGINX
if ($http_x_forwarded_proto = 'http') {
return 301 https://$host$request_uri;
}
NGINX