Moving exclusive lease keys from redis persistent to a redis cluster
Originating from gitlab-com/gl-infra/scalability#2452 (closed). ExclusiveLease
keys consume substantial resources on Redis persistent which is forecasted to saturate on its primary CPU.
This issue discusses the possible approaches to perform migration for both SaaS and produce a release for SM users.
Conventional ways to migrate Redis workloads via MultiStore
is not feasible due to the possibility of mismatched states if 2 processes were to try_obtain
simultaneously.
An example of differing resulting states:
proc A and proc B calls setnx on multistore
# due to network routing, the ordering received on Redis could be interwoven if the calls are made simultaneously
proc A multistore calls SETNX on key XYZ in sharedstate -- succeeds
proc B multistore calls SETNX on key XYZ in sharedstate -- fails (not error)
proc B multistore calls SETNX on key XYZ in cache -- succeeds
proc A multistore calls SETNX on key XYZ in cache -- fails (not error)
the result is that proc A sets in sharedstate while proc B sets in cache
Proposed approach
One idea is to run the 2nd setnx
in ClusterSharedState
only after the first setnx
succeeds (controlled by a feature flag). If multiple processes were to try_obtain
the exclusive lease for a particular key, only 1 of the processes will set the key and proceed to set the key in Cache
.
For other Gitlab::ExclusiveLease
write operations, we could take on a dual-write approach safely. These operations are
-
renew
: runs a Lua script to reset ttl if the uuid matches -
cancel
: runs a Lua script to delete the key if the uuid matches
The read operations can continue to use SharedState
-
exist
: checks if key exists -
ttl
: reads ttl -
get_uuid
: gets uuid of a key