2020-02-28: haproxy is saturating cpu on fe-[17,24]
Summary
The haproxy nodes fe-[17,24]
are experiencing higher cpu load than any other nodes.
It appears that the api_rate_limit
acl is handling far higher levels of traffic than any other node, or endpoint.
More information will be added as we investigate the issue.
Timeline
All times UTC.
2020-02-28
- 14:34 - Ben asks for extra eyeballs on his investigation into higher CPU load on two fe haproxy nodes.
- ~15:00 - It is discovered that the high cpu saturation is due to rate limiting that
faceitBlacklist
project deployed software clients. https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/9284
Draining and reboot process
-
fe-01-lb-gprd.c.gitlab-production.internal -
fe-02-lb-gprd.c.gitlab-production.internal -
fe-03-lb-gprd.c.gitlab-production.internal -
fe-04-lb-gprd.c.gitlab-production.internal -
fe-05-lb-gprd.c.gitlab-production.internal -
fe-06-lb-gprd.c.gitlab-production.internal -
fe-07-lb-gprd.c.gitlab-production.internal -
fe-08-lb-gprd.c.gitlab-production.internal -
fe-09-lb-gprd.c.gitlab-production.internal -
fe-10-lb-gprd.c.gitlab-production.internal -
fe-11-lb-gprd.c.gitlab-production.internal -
fe-12-lb-gprd.c.gitlab-production.internal -
fe-13-lb-gprd.c.gitlab-production.internal -
fe-14-lb-gprd.c.gitlab-production.internal -
fe-15-lb-gprd.c.gitlab-production.internal -
fe-16-lb-gprd.c.gitlab-production.internal -
fe-17-lb-gprd.c.gitlab-production.internal -
fe-18-lb-gprd.c.gitlab-production.internal -
fe-19-lb-gprd.c.gitlab-production.internal -
fe-20-lb-gprd.c.gitlab-production.internal -
fe-21-lb-gprd.c.gitlab-production.internal -
fe-22-lb-gprd.c.gitlab-production.internal -
fe-23-lb-gprd.c.gitlab-production.internal -
fe-24-lb-gprd.c.gitlab-production.internal
Resources
- If the Situation Zoom room was utilised, recording will be automatically uploaded to Incident room Google Drive folder (private)
Edited by Alejandro Rodríguez