Turn off keepalive on haproxy
C2
Production Change - Criticality 2Change Objective | Disable keepalive on haproxy to mitigate issues with CloudFlare connections mixing up source IPs |
---|---|
Change Type | ConfigurationChange |
Services Impacted | ServiceHAProxy |
Change Team Members | @cmiskell @T4cC0re |
Change Criticality | C2 |
Change Reviewer | @dawsmith |
Tested in staging | Yes: #1863 (comment 313537860) and TBD |
Dry-run output | N/A |
Due Date | 2020-03-30 01:15UTC (14:15 for @cmiskell |
Time tracking | 30 minutes |
Rationale
We set the src IP address used within our front-end haproxy for rate-limiting, logging, and passing to the backend servers in X-Forwarded-For based on CF-Connecting-IP from CloudFlare. However connections from CloudFlare to our haproxy utilize keepalive, and per https://github.com/haproxy/haproxy/issues/90 haproxy is only setting the src per connection, not per-request, leading to connections being misattributed.
Simply passing additional headers (e.g. copying CF-Connecting-IP to another header) would solve only passing the correct IP to the backend, but rate-limiting would still be affected as it uses src-http-req-rate, and haproxy logs would also contain the wrong IP.
While there may be better ways to solve this (e.g. rate-limiting based on the header), that is trickier than simply disabling keep-alive.
Preconditions
-
The gitlab-haproxy cookbook version 1.1.6 is available in the gprd environment (https://ops.gitlab.net/gitlab-cookbooks/chef-repo/-/merge_requests/3001)
Detailed steps for the change
-
Apply the chef-repo MR to set gitlab-haproxy.close_client_connections
true. -
Run chef on the haproxy nodes at a small concurrency: knife ssh 'roles:gprd-base-lb-fe' -C4 "sudo chef-client"
- Validate the change has worked:
-
while true; do curl -A cmiskell -I https://gitlab.com/cmiskell/playground | grep cf-ray; sleep 2; done
-
https://log.gprd.gitlab.net/goto/c1a50f4a15f26d2caac86b99a84913ca - should stabilize with only a single IP as the remote_ip for any given client (where currently it is variable)
-
- Validate the change has not affected large git operations from a remote client`:
-
git push with huge repo (linux kernel) and added a 600M linux iso in there, too. -
git pull of that repo after it's pushed -
create a new branch in the UI -
git fetch that branch
-
-
Monitor https://dashboards.gitlab.net/d/mnbqU9Smz/fleet-overview?orgId=1, particularly CPU
Rollback steps
-
Revert the chef-repo MR to revert to default gitlab-haproxy.close_client_connections
to false -
Run chef on the haproxy nodes at a small concurrency: knife ssh 'roles:gprd-base-lb-fe' -C4 "sudo chef-client"
Changes checklist
-
Detailed steps and rollback steps have been filled prior to commencing work -
Person on-call has been informed prior to change being rolled out