Redis Cache Sentinel node_schedstat_waiting

Recently, there has been a noticeable uptick in the node_schedstat_waiting saturation resource for the Redis Cache fleet. Compared to other fleets, around 9% is very high. (the main redis fleet is 0.03% which is a more healthy number)

https://gitlab-com.gitlab.io/gl-infra/tamland/redis.html#redis-cache-service-node_schedstat_waiting-resource-saturation

CleanShot_2021-11-23_at_20.44.47


Short term metric:

CleanShot_2021-11-23_at_20.48.41

https://dashboards.gitlab.net/d/alerts-sat_node_schedstat_waiting/alerts-node_schedstat_waiting-saturation-detail?from=now-30d&to=now&var-PROMETHEUS_DS=Global&var-environment=gprd&var-type=redis-cache&var-stage=main&var-component=node_schedstat_waiting&orgId=1


Interestingly, the problem appears to be the redis-sentinel nodes. What are they doing that is causing so much waiting?