Investigate slow DELs in redis
@smcgivern and @andrewn were looking into slow DEL calls against redis-cache (sample), that look like this:
json.command: del, elasticsearch_enabled_cache:project
json.exec_time: 227,276 (us)
This is surprising, because as per production#1766 (closed) and gitlab-org/omnibus-gitlab!4008 (merged), we have lazyfree-lazy-server-del
enabled, which should implicitly turn all DELs into UNLINKs.
I confirmed that we do in fact have it set in production:
PRODUCTION PRIMARY-REDIS iwiedler@redis-cache-01-db-gprd.c.gitlab-production.internal:~$ REDISCLI_AUTH="$(sudo grep ^requirepass /var/opt/gitlab/redis/redis.conf|cut -d\" -f2)" /opt/gitlab/embedded/bin/redis-cli config get lazyfree-lazy-server-del
1) "lazyfree-lazy-server-del"
2) "yes"
We also confirmed the results via funclatency
. This shows some invocations taking 8-15 ms.
$ sudo bpftrace -l 'uprobe:/var/lib/snapd/hostfs/opt/gitlab/embedded/bin/redis-server'
$ sudo /usr/share/bcc/tools/funclatency -m -d 10 '/opt/gitlab/embedded/bin/redis-server:delCommand'
Tracing 1 functions for "/opt/gitlab/embedded/bin/redis-server:delCommand"... Hit Ctrl-C to end.
Function = delCommand [30715]
msecs : count distribution
0 -> 1 : 41403 |****************************************|
2 -> 3 : 0 | |
4 -> 7 : 1 | |
8 -> 15 : 1 | |
Detaching...
However, @jacobvosmaer-gitlab discovered what looks like a bug in redis, which is making that option be effectively ignored:
DEL calls delGenericCommand(c,0) https://github.com/redis/redis/blob/5.0.7/src/db.c#L486-L488
and that calls dbSyncDelete https://github.com/redis/redis/blob/5.0.7/src/db.c#L473-L474
so it doesn't care about the lazy config setting at all
We do explicitly issue UNLINK commands for the SetCache: gitlab-org/gitlab!27116 (merged).
We should be able to do the same thing for ElasticsearchEnabledCache
in the mean time, while we work with redis upstream to get this fixed.
cc @smcgivern @andrewn @jacobvosmaer-gitlab @stanhu @robotmay_gitlab