Redis Cluster validator
Workflow
This issue is in workflow-infraReady, and it is there because it's ready to start work.
As we work on this, we might discover more questions: in particular, whether we start with this on or off by default. But we should be able to make a start.
Description
We might need to understand better how to shard Redis on gitlab.com in order to devise our long-term Redis sharding strategy.
It is very important to understand all the operations on Redis keys that might involve operations on multiple keys. Not having this knowledge is a major blocker for us, if would like to move towards Redis sharding.
In the sharded environment Redis setup will answer with an error ERR CROSSSLOT Keys in request don't hash to the same slot
if keys we want to perform some operation on do not reside on the same slot / shard.
We need to understand better how can design our sharding techniques in a way that this is not a blocker.
Proposal
Design a CROSSLOT validator, that would be an additional layer of abstraction in front of our Redis facade code, that would check if a Redis keys scheme used to perform operations on multiple keys resolves to a single shard.
Step 1: monkeypatch Redis Client calls to intercept all Redis calls
This follows the approach used in LabKit-Ruby to monkeypatch the Redis Client to intercept all Redis calls for distributed tracing.
The LabKit implementation is here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis/redis_interceptor.rb. The injection occurs here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis.rb.
The CROSSSLOT validator implementation would follow similar boilerplate, but obviously instead of injecting emitting tracing spans, it would validate the Redis command.
Step 2: determine whether the a call involves multiple keys
Multiple approaches could be used, but this would probably be easiest implemented with a whitelist of known multikey commands in Redis (eg MGET
). When a key is not multikey, the validator skips the call.
An alternative is to check if the position of the first and last keys in the argument list are the same: https://redis.io/commands/command. This is exposed to Ruby through Redis::Cluster::CommandLoader
.
Step 3: for any multikey calls, ensure that all keys are for the same slot
Using the algorithm published in the Redis documentation, the slot for each key is calculated:
def HASH_SLOT(key)
s = key.index "{"
if s
e = key.index "}",s+1
if e && e != s+1
key = key[s+1..e-1]
end
end
crc16(key) % 16384
end
This is present in two methods in two classes in redis-rb.
-
Redis::Cluster::Command#extract_first_key
to get the key for hashing -
Redis::Cluster::KeySlotConverter.convert
converts a key into a hash slot.
If the command contains keys which resolve to different hash slots, then validation fails.
The form of this failure warrants more discussion. I propose that initially,
- In development and CI, an exception is raised, with pointers to developer documentation on how to fix the problem
- In production, a log is written
Step 4: Allow exceptions inside a yield block
Similarly to the approach used by the Gitaly n+1 detector's exemption block, Gitlab::GitalyClient.allow_n_plus_1_calls
, we could allow technical debt to accrue through an exception block, storing the state in a threadlocal or similar storage mechanism.
The implementation of the Gitaly version is here: https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/gitaly_client.rb#L309
Step 5: Add exemption blocks for all known violations
Add exemption block for all known violators. For third-party code, such as Rails session management, we should consider adding another Redis connection which is entirely exempt from the validator.
This could would likely not be migrated to Redis Cluster.