Skip to content

Investigate the spikiness of redis-cache latency apdex

Spinning over from https://gitlab.com/gitlab-com/gl-infra/production/issues/1722

Over the period of 2020-03-03 10:15 UTC till 2020-03-05 19:30 UTC we observed uniform spikes in the primary redis-cache latency apdex, often crossing the degradation SLO:

redis-cache-spikes

This is usually accompanied by regular spikes in client connections, but it could be unrelated:

client-connections

(graph zoomed in to properly show the spikes)

Some network-level investigation has been done in https://gitlab.com/gitlab-com/gl-infra/production/-/issues/1722#note_298560627 but it still inconclusive.


Summary of findings (added by @igorwwwwwwwwwwwwwwwwwwww):

Actions taken:

Actions pending:

Long-term:

  • Scale out redis &80
Edited by Heinrich Lee Yu