Investigate large RPS spike in redis-cluster-shared-state
Problem
@fshabir Noticed a large spike in RPS for ServiceRedisClusterSharedState starting from 2025/01/13, accompanied by increase in CPU saturation from ~40% to ~65%. (capacity warning)
| RPS | CPU |
|---|---|
Zooming in, it coincided with a deployment https://gitlab.com/gitlab-org/security/gitlab/-/compare/b008d75be91...92ddd23a9b9#e33ffd7a3f2e3a205c3b45e35d76fb8e173874d7
The spikes were coming from get, set, del and eval operations:
At a first glance, these four operations look like a typical exclusive lease cycle:
- Obtaining lease with
sethttps://gitlab.com/gitlab-org/gitlab/blob/4503011417fc5a48d5d0562e95e5874ad8c81f86/lib/gitlab/exclusive_lease.rb#L131-133 - Do stuff
- Cancel the lease with
eval,getanddelhttps://gitlab.com/gitlab-org/gitlab/blob/4503011417fc5a48d5d0562e95e5874ad8c81f86/lib/gitlab/exclusive_lease.rb#L67-69
By type, we saw increase mostly from api :
Root cause
Save last used IP address to personal access to... (gitlab-org/gitlab!161076 - merged) changed the way PAT is saved by always obtaining exclusive lease from Redis SharedState even if there might be no update needed.
Solution
Avoid obtaining lease if no update required (gitlab-org/gitlab!193010 - merged) tries to optimize by checking whether an update is needed before obtaining the exclusive lease




