Skip to content

Do not run Redis keywatcher for non-Runner WH workloads

While monitoring CPU and memory during the Action Cable rollout I noticed that a significant portion of CPU was spent on observing Redis keys in keywatcher; I am unfamiliar with this code but it appears to exist to support GitLab Runner; however, this system is not relevant in all of the contexts in which Workhorse executes. For instance, the websockets fleet does not appear to require this, as the number of keywatchers is in fact 0 in all deployments except those of type api: https://thanos-query.ops.gitlab.net/graph?g0.range_input=1h&g0.max_source_resolution=0s&g0.expr=sum%20by%20(type)%20(gitlab_workhorse_keywatcher_keywatchers)&g0.tab=0

Because it is connected to the same Redis cluster that have keyspace activity, however, all the notifications that are published to these channels are picked up in these WH instances and consume CPU and memory.

We should not run the keywatcher on Workhorses that do not service clients interested in these notifications.

Here is a CPU profile I pulled from stackprof from the websockets fleet, which seems to indicate that 50% of average CPU time was spent processing Redis notifications that no-one is subscribed to:

Screenshot_from_2021-01-26_09-52-56

https://console.cloud.google.com/profiler;timespan=1h/workhorse-websockets;type=CPU/cpu;filter=hideStacks:pollProfilerService?project=gitlab-production&authuser=0

We should consider to not enter the Process function when there are no subscribers.

Edited by Matthias Käppler