Unusual connected client burst in redis-cache
Extract from #751 (comment 499154363).
When looking into the connected clients metrics, I observe a pattern that the number of connected clients in redis-cache
bursts abnormally frequently. The number of connected clients increases by 30-40%.
During this time window, there are some observations:
- The latency of redis calls from the client perspective peaks, possibly up to 300%.
- The items start to show up in the slowlog.
- However, the amount of operations remain stable, nearly unchanged:
-
redis-sidekiq
andredis
(redis-persistent) are not affected.
Debugging
Update 2021-02-04:
- The events usually (but not 100% persistent) occur at minute 0 and minute 30. I guess they are the results of some cronjobs running somewhere.
- All calls to
redis-cache
done in Rails monolith must go throughGitlab::Redis::Cache
wrapper, which employs a Redis pool. It means the number of connections is under control, up to (5 + number of threads) per process. So there could be a cronjob worker creates its own Redis connection or Redis pool; which is highly unlikely. - Therefore, there maybe a cronjob, likely a housekeeping task, running somewhere in our system.