Setup metrics, monitoring, dashboards, alerts

  • Not sure yet how much of the existing tooling we can reuse. Certainly some, but not all.
  • May need to extend redis_exporter, if it does not already support harvesting cluster-specific metrics.
  • Definitely need dashboard additions. Examples: We will want new dashboard sections to visualize the cluster-wide state, such as graphs showing metrics for all N masters. It may also be useful to make a dashboard for zooming in to the metrics for just a single shard's nodes (1 master and replicas). May want a graph showing the number of healthy/failed replicas following each master (especially if we allow the cluster to assign a variable number of replicas to each master).

See dashboard for redis-cluster-ratelimiting

Tasks 1. Add redis-cluster-ratelimiting service into runbooks 2. Add cluster-specific recording rules (if needed) 3. Create standard Redis dashboard with cluster-specific information 4. Enable alert rules

Edited by Sylvester Chin