Skip to content

Turn off keepalive on haproxy

Production Change - Criticality 2 C2

Change Objective Disable keepalive on haproxy to mitigate issues with CloudFlare connections mixing up source IPs
Change Type ConfigurationChange
Services Impacted ServiceHAProxy
Change Team Members @cmiskell @T4cC0re
Change Criticality C2
Change Reviewer @dawsmith
Tested in staging Yes: #1863 (comment 313537860) and TBD
Dry-run output N/A
Due Date 2020-03-30 01:15UTC (14:15 for @cmiskell
Time tracking 30 minutes

Rationale

We set the src IP address used within our front-end haproxy for rate-limiting, logging, and passing to the backend servers in X-Forwarded-For based on CF-Connecting-IP from CloudFlare. However connections from CloudFlare to our haproxy utilize keepalive, and per https://github.com/haproxy/haproxy/issues/90 haproxy is only setting the src per connection, not per-request, leading to connections being misattributed.

Simply passing additional headers (e.g. copying CF-Connecting-IP to another header) would solve only passing the correct IP to the backend, but rate-limiting would still be affected as it uses src-http-req-rate, and haproxy logs would also contain the wrong IP.

While there may be better ways to solve this (e.g. rate-limiting based on the header), that is trickier than simply disabling keep-alive.

Preconditions

  1. The gitlab-haproxy cookbook version 1.1.6 is available in the gprd environment (https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3001)

Detailed steps for the change

  1. Apply the chef-repo MR to set gitlab-haproxy.close_client_connections true.
  2. Run chef on the haproxy nodes at a small concurrency: knife ssh 'roles:gprd-base-lb-fe' -C4 "sudo chef-client"
  3. Validate the change has worked:
    1. while true; do curl -A cmiskell -I https://gitlab.com/cmiskell/playground | grep cf-ray; sleep 2; done
    2. https://log.gprd.gitlab.net/goto/c1a50f4a15f26d2caac86b99a84913ca - should stabilize with only a single IP as the remote_ip for any given client (where currently it is variable)
  4. Validate the change has not affected large git operations from a remote client`:
    1. git push with huge repo (linux kernel) and added a 600M linux iso in there, too.
    2. git pull of that repo after it's pushed
    3. create a new branch in the UI
    4. git fetch that branch
  5. Monitor https://dashboards.gitlab.net/d/mnbqU9Smz/fleet-overview?orgId=1, particularly CPU

Rollback steps

  1. Revert the chef-repo MR to revert to default gitlab-haproxy.close_client_connections to false
  2. Run chef on the haproxy nodes at a small concurrency: knife ssh 'roles:gprd-base-lb-fe' -C4 "sudo chef-client"

Changes checklist

  • Detailed steps and rollback steps have been filled prior to commencing work
  • Person on-call has been informed prior to change being rolled out
Edited by Hendrik Meyer (xLabber)