Redis CROSSLOT validator
Description
We might need to understand better how to shard Redis on gitlab.com in order to devise our long-term Redis sharding strategy.
It is very important to understand all the operations on Redis keys that might involve operations on multiple keys. Not having this knowledge is a major blocker for us, if would like to move towards Redis sharding.
In the sharded environment Redis setup will answer with an error ERR CROSSSLOT Keys in request don't hash to the same slot
if keys we want to perform some operation on do not reside on the same slot / shard.
We need to understand better how can design our sharding techniques in a way that this is not a blocker.
Proposal
Design a CROSSLOT validator, that would be an additional layer of abstraction in front of our Redis facade code, that would check if a Redis keys scheme used to perform operations on multiple keys resolves to a single shard.
Step 1: monkeypatch Redis Client calls to intercept all Redis calls
This follows the approach used in LabKit-Ruby to monkeypatch the Redis Client to intercept all Redis calls for distributed tracing.
The LabKit implementation is here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis/redis_interceptor.rb. The injection occurs here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis.rb.
The CROSSSLOT validator implementation would follow similar boilerplate, but obviously instead of injecting emitting tracing spans, it would validate the Redis command.
Step 2: determine whether the a call involves multiple keys
Multiple approaches could be used, but this would probably be easiest implemented with a whitelist of known multikey commands in Redis (eg MGET
). When a key is not multikey, the validator skips the call.
Step 3: for any multikey calls, ensure that all keys are for the same slot
Using the algorithm published in the Redis documentation, the slot for each key is calculated:
def HASH_SLOT(key)
s = key.index "{"
if s
e = key.index "}",s+1
if e && e != s+1
key = key[s+1..e-1]
end
end
crc16(key) % 16384
end
If the command contains keys which resolve to different hash slots, then validation fails.
The form of this failure warrants more discussion. I propose that initially,
- In development and CI, an exception is raised, with pointers to developer documentation on how to fix the problem
- In production, a log is written
Step 4: Allow exceptions inside a yield block
Similarly to the approach used by the Gitaly n+1 detector's exemption block, Gitlab::GitalyClient.allow_n_plus_1_calls
, we could allow technical debt to accrue through an exception block, storing the state in a threadlocal or similar storage mechanism.
The implementation of the Gitaly version is here: https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/gitaly_client.rb#L309
Step 5: Add exemption blocks for all known violations
Add exemption block for all known violators. For third-party code, such as Rails session management, we should consider adding another Redis connection which is entirely exempt from the validator.
This could would likely not be migrated to Redis Cluster.