Degraded performance on GitLab.com
Currently seeing a significant site slow-down and an increase in errors
RCA issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7157
Summary
- As of
Jul 1 14:01:38 UTC 2019
we are currently to investigate the slowdown on GitLab.com that is affecting all web requests that are serviced by the rails front-end. The slowdown is in redis-cache, in the lru cluster. There were no recent application or configuration changes prior to degradation.
Current Redis topology
- redis primary -
redis-03
- redis cache -
redis-cache-02
Remediation options
- deploy https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14500 as hot patch, eliminating a heavy call to redis on every request
- turn off junit config, uploading junit artifacts (https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7146)
- upgrade kernel to hopefully improve sys cpu usage (https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7145)
- scaling up the hardware (not sure if that would improve things)
Metrics controller looks very bad, but this might just be a symptom:
Increased rate of 5xx errors:
Edited by Henri Philipps