Rack_Attack blocks access to the health check endpoints

Summary

GitLab 11.2.3-ee (aadca994104)
GitLab Shell 8.1.1
GitLab Workhorse v5.1.0
GitLab API v4
Ruby 2.4.4p296
Rails 4.2.10

GitLab EE Omnibus in AWS cloud + external postgres and redis.
user -> AWS ELB -> Gitlab nginx -> GitLab App.

AWS ELB does health checks against /-/readiness endpoint. ELB health check configuration:
Timeout 3 seconds
Interval 5 seconds
Unhealthy threshold 3
Healthy threshold 3

Occasionally, ELB health check requests are blocked and 429 HTTP status code is returned back to ELB. Thus ELB takes all the instances out of the service at once.

At the same time, gitlab-ctl status shows that all components are running and no errors in logs only Rack attack relevant logs.

==> /var/log/gitlab/gitlab-rails/production.log <==
Rack_Attack: throttle 127.0.0.1 GET /-/readiness

==> /var/log/gitlab/gitlab-workhorse/current <==
[2018/09/17:15:25:06 +0000] "GET /-/readiness HTTP/1.1" 429 12 "" "ELB-HealthChecker/1.0" 0.009

curl on the instances itself:

curl http://localhost/-/readiness
Retry later

we do not have any configuration entries related to rack_attack in /etc/gitlab/gitlab.rb configuration file. as per documentation https://docs.gitlab.com/ee/security/rack_attack.html :

Note: Starting with 11.2, Rack Attack is disabled by default.

Could you please confirm that Rack_Attack is disabled by default? Could you please provide configuration parameters to exclude/whitelist ELB from being blocked?

Thank you in advance!

Steps to reproduce

N/A

Configuration used

external_url 'gitlab.example.com'
registry_external_url 'registry.example.com'

nginx['listen_port'] = 80
nginx['listen_https'] = false
nginx['proxy_set_headers'] = {
    "X-Forwarded-Proto" => "https",
    "X-Forwarded-Ssl" => "on"
}
nginx['real_ip_trusted_addresses'] = [ '${ELB_IPs}' ]
nginx['real_ip_header'] = 'X-Forwarded-For'
nginx['real_ip_recursive'] = 'on'

registry_nginx['listen_port'] = 80
registry_nginx['listen_https'] = false
registry_nginx['proxy_set_headers'] = {
    'X-Forwarded-Proto' => 'https',
    'X-Forwarded-Ssl' => 'on'
}
registry_nginx['real_ip_trusted_addresses'] = [ '${ELB_IPs}' ]
registry_nginx['real_ip_header'] = 'X-Forwarded-For'
registry_nginx['real_ip_recursive'] = 'on'

high_availability['mountpoint'] = ['${MOUNT_POINTS}']

postgresql['enable'] = false
redis['enable'] = false

gitlab_rails['db_adapter'] = "postgresql"
gitlab_rails['auto_migrate'] = false

gitlab_rails['redis_host'] = ""
gitlab_rails['redis_port'] = ""

gitlab_rails['monitoring_whitelist'] = ['127.0.0.0/8', '${ELB_IPs}']

gitlab_rails['smtp_enable'] = true

gitlab_rails['gitlab_default_can_create_group'] = false
gitlab_rails['rake_cache_clear'] = false

unicorn['worker_processes'] = 4

Current behavior

Occasionally, ELB health check requests are blocked and 429 HTTP status code is returned back to ELB. Thus ELB takes all the instances out of the service at the same time.

At the same time gitlab-ctl status shows that all components are running and no errors in logs only Rack attack related logs.

Expected behavior

normal operation, no health check requests blocked.

Versions

  • Platform:
    • Cloud: AWS

Relevant logs

==> /var/log/gitlab/gitlab-rails/production.log <==
Rack_Attack: throttle 127.0.0.1 GET /-/readiness

==> /var/log/gitlab/gitlab-workhorse/current <==
[2018/09/17:15:25:06 +0000] "GET /-/readiness HTTP/1.1" 429 12 "" "ELB-HealthChecker/1.0" 0.009