Redis CROSSLOT validator

Description

We might need to understand better how to shard Redis on gitlab.com in order to devise our long-term Redis sharding strategy.

It is very important to understand all the operations on Redis keys that might involve operations on multiple keys. Not having this knowledge is a major blocker for us, if would like to move towards Redis sharding.

In the sharded environment Redis setup will answer with an error ERR CROSSSLOT Keys in request don't hash to the same slot if keys we want to perform some operation on do not reside on the same slot / shard.

We need to understand better how can design our sharding techniques in a way that this is not a blocker.

Proposal

Design a CROSSLOT validator, that would be an additional layer of abstraction in front of our Redis facade code, that would check if a Redis keys scheme used to perform operations on multiple keys resolves to a single shard.

Step 1: monkeypatch Redis Client calls to intercept all Redis calls

This follows the approach used in LabKit-Ruby to monkeypatch the Redis Client to intercept all Redis calls for distributed tracing.

The LabKit implementation is here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis/redis_interceptor.rb. The injection occurs here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis.rb.

The CROSSSLOT validator implementation would follow similar boilerplate, but obviously instead of injecting emitting tracing spans, it would validate the Redis command.

Step 2: determine whether the a call involves multiple keys

Multiple approaches could be used, but this would probably be easiest implemented with a whitelist of known multikey commands in Redis (eg MGET). When a key is not multikey, the validator skips the call.

Step 3: for any multikey calls, ensure that all keys are for the same slot

Using the algorithm published in the Redis documentation, the slot for each key is calculated:

def HASH_SLOT(key)
    s = key.index "{"
    if s
        e = key.index "}",s+1
        if e && e != s+1
            key = key[s+1..e-1]
        end
    end
    crc16(key) % 16384
end

If the command contains keys which resolve to different hash slots, then validation fails.

The form of this failure warrants more discussion. I propose that initially,

In development and CI, an exception is raised, with pointers to developer documentation on how to fix the problem
In production, a log is written

Step 4: Allow exceptions inside a yield block

Similarly to the approach used by the Gitaly n+1 detector's exemption block, Gitlab::GitalyClient.allow_n_plus_1_calls, we could allow technical debt to accrue through an exception block, storing the state in a threadlocal or similar storage mechanism.

The implementation of the Gitaly version is here: https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/gitaly_client.rb#L309

Step 5: Add exemption blocks for all known violations

Add exemption block for all known violators. For third-party code, such as Rails session management, we should consider adding another Redis connection which is entirely exempt from the validator.

This could would likely not be migrated to Redis Cluster.

Edited Feb 17, 2020 by Andrew Newdigate