Redis Cluster: Zone-aware data placement and rebalancing
This was brought up by @schin1.
Background
We are looking to migrate some workloads to Redis Cluster in order to scale horizontally. See #1945 (closed)
We need to evaluate whether Redis Cluster meets all of our requirements. One of these requirements is high availability, in particular when it comes to a zonal outage.
Status quo with sentinel
With Redis sentinel, we create redis hosts in distinct zones. Each replica contains a full copy of the entire key space. Thus, in case of a zonal failure, we can failover to an other zone.
Redis Cluster
With Redis Cluster, the situation becomes more complicated. While we can provision nodes in distinct zones, the assignment of shards to nodes is handled separately. If we don't take special care, data for some shards may land on nodes that are all within the same zone.
This could result in unavailability or even data loss when faced with a zonal failure.
There is support for zone awareness in Redis Enterprise, but this does not appear to be present in the OSS release. A similar question recently was asked on the Redis issue tracker.
Analysis
This is still very preliminary, but it appears that the shard -> node assignment is managed externally via redis-cli
(or similar tools).
redis-cli
in particular implements commands create
, reshard
, rebalance
, add-node
, del-node
, amongst others (redis-cli --cluster help
for a full list). These commands implement the logic for changing the assignments.
This suggests that we may be able to implement our own shard assigner / rebalancer, which respects our data placement needs. It may also give us more flexibility to isolate hot keys if needed. For inspiration: the previous implementation of this logic lived in redis-trib.rb
, though it's now been replaced with the C-based implementation in redis-cli
.
Alternatively we may consider contributing such a feature to upstream redis-cli
.
Recommendation
TBD. More analysis and discussion needed.