Create connection based rate limiting on Pages HAProxy
Summary
There have been several incidents related to connection spikes causing Pages apdex alerts. This issue will track the investigation and implementation of Pages rate limiting on HAProxy based on TCP connections. By doing this, we hope to prevent the scenario where a large spike of traffic overwhelms the Pages application. Even though we hit rate limits at the application, these large spikes are enough to saturate the service.
Related Incident(s)
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6985
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6936
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6799
- etc etc.
Desired Outcome/Acceptance Criteria
-
PoC for rate limits on PreProd -
HAProxy cookbook changes to add rate limiting based on TCP connections -
Test rate limits on Staging -
Enable rate limits on Production
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'priority::4')
Edited by John Jarvis