Investigate large RPS spike in redis-cluster-shared-state

Problem

@fshabir Noticed a large spike in RPS for ServiceRedisClusterSharedState starting from 2025/01/13, accompanied by increase in CPU saturation from ~40% to ~65%. (capacity warning)

RPS

CPU

Zooming in, it coincided with a deployment https://gitlab.com/gitlab-org/security/gitlab/-/compare/b008d75be91...92ddd23a9b9#e33ffd7a3f2e3a205c3b45e35d76fb8e173874d7

The spikes were coming from get, set, del and eval operations:

At a first glance, these four operations look like a typical exclusive lease cycle:

Obtaining lease with set https://gitlab.com/gitlab-org/gitlab/blob/4503011417fc5a48d5d0562e95e5874ad8c81f86/lib/gitlab/exclusive_lease.rb#L131-133
Do stuff
Cancel the lease with eval , get and del https://gitlab.com/gitlab-org/gitlab/blob/4503011417fc5a48d5d0562e95e5874ad8c81f86/lib/gitlab/exclusive_lease.rb#L67-69

By type, we saw increase mostly from api :

Root cause

Save last used IP address to personal access to... (gitlab-org/gitlab!161076 - merged) changed the way PAT is saved by always obtaining exclusive lease from Redis SharedState even if there might be no update needed.

Solution

Avoid obtaining lease if no update required (gitlab-org/gitlab!193010 - merged) tries to optimize by checking whether an update is needed before obtaining the exclusive lease

Edited Jun 17, 2025 by Marco Gregorius