Redis Cluster validator

Workflow

This issue is in workflow-infraReady, and it is there because it's ready to start work.

As we work on this, we might discover more questions: in particular, whether we start with this on or off by default. But we should be able to make a start.

Description

We might need to understand better how to shard Redis on gitlab.com in order to devise our long-term Redis sharding strategy.

It is very important to understand all the operations on Redis keys that might involve operations on multiple keys. Not having this knowledge is a major blocker for us, if would like to move towards Redis sharding.

In the sharded environment Redis setup will answer with an error ERR CROSSSLOT Keys in request don't hash to the same slot if keys we want to perform some operation on do not reside on the same slot / shard.

We need to understand better how can design our sharding techniques in a way that this is not a blocker.

Proposal

Design a CROSSLOT validator, that would be an additional layer of abstraction in front of our Redis facade code, that would check if a Redis keys scheme used to perform operations on multiple keys resolves to a single shard.

Step 1: monkeypatch Redis Client calls to intercept all Redis calls

This follows the approach used in LabKit-Ruby to monkeypatch the Redis Client to intercept all Redis calls for distributed tracing.

The LabKit implementation is here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis/redis_interceptor.rb. The injection occurs here: https://gitlab.com/gitlab-org/labkit-ruby/-/blob/master/lib/labkit/tracing/redis.rb.

The CROSSSLOT validator implementation would follow similar boilerplate, but obviously instead of injecting emitting tracing spans, it would validate the Redis command.

Step 2: determine whether the a call involves multiple keys

Multiple approaches could be used, but this would probably be easiest implemented with a whitelist of known multikey commands in Redis (eg MGET). When a key is not multikey, the validator skips the call.

An alternative is to check if the position of the first and last keys in the argument list are the same: https://redis.io/commands/command. This is exposed to Ruby through Redis::Cluster::CommandLoader.

Step 3: for any multikey calls, ensure that all keys are for the same slot

Using the algorithm published in the Redis documentation, the slot for each key is calculated:

def HASH_SLOT(key)
    s = key.index "{"
    if s
        e = key.index "}",s+1
        if e && e != s+1
            key = key[s+1..e-1]
        end
    end
    crc16(key) % 16384
end

This is present in two methods in two classes in redis-rb.

Redis::Cluster::Command#extract_first_key to get the key for hashing
Redis::Cluster::KeySlotConverter.convert converts a key into a hash slot.

If the command contains keys which resolve to different hash slots, then validation fails.

The form of this failure warrants more discussion. I propose that initially,

In development and CI, an exception is raised, with pointers to developer documentation on how to fix the problem
In production, a log is written

Step 4: Allow exceptions inside a yield block

Similarly to the approach used by the Gitaly n+1 detector's exemption block, Gitlab::GitalyClient.allow_n_plus_1_calls, we could allow technical debt to accrue through an exception block, storing the state in a threadlocal or similar storage mechanism.

The implementation of the Gitaly version is here: https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/gitaly_client.rb#L309

Step 5: Add exemption blocks for all known violations

Add exemption block for all known violators. For third-party code, such as Rails session management, we should consider adding another Redis connection which is entirely exempt from the validator.

This could would likely not be migrated to Redis Cluster.

Edited May 06, 2020 by Sean McGivern