Degraded performance on GitLab.com

Currently seeing a significant site slow-down and an increase in errors

RCA issue: https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7157

Summary

  • As of Jul 1 14:01:38 UTC 2019 we are currently to investigate the slowdown on GitLab.com that is affecting all web requests that are serviced by the rails front-end. The slowdown is in redis-cache, in the lru cluster. There were no recent application or configuration changes prior to degradation.

Current Redis topology

  • redis primary - redis-03
  • redis cache - redis-cache-02

Remediation options

  • deploy https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/14500 as hot patch, eliminating a heavy call to redis on every request
  • turn off junit config, uploading junit artifacts (https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7146)
  • upgrade kernel to hopefully improve sys cpu usage (https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/7145)
  • scaling up the hardware (not sure if that would improve things)

image

Screen_Shot_2019-07-01_at_10.52.32_AM

Metrics controller looks very bad, but this might just be a symptom:

Screen_Shot_2019-07-01_at_10.52.53_AM

Increased rate of 5xx errors:

Screen_Shot_2019-07-01_at_10.53.34_AM

Edited Jul 03, 2019 by Henri Philipps
Assignee Loading
Time tracking Loading