CommitsByMessage takes a long time for large repos
Summary
A customer has reached out to Support recently to report very slow grpc requests relating to viewing commits. It seems that CommitsByMessage
, which is initiated when a search
parameter is provided to the Projects::CommitsController
, performs a git log
with grep
and regexp-ignore-case
options.
In the customer's case, these grpc requests are taking a minimum of 20 seconds, with times as high as 27s in the logs we've analysed. The repo in question is 435GB with over 2 million commits, 14k branches, 9k tags.
Impact
I've noticed while searching in Elasticsearch that this is also affecting larger repos on GitLab.com. I've redacted the project names in the screenshot below, which shows top projects by CommitsByMessage
grpc time in the last 24 hours:
- Top 1: 1.15 million commits
- Top 2: 1.19 million commits
- Top 3: 1.07 million commits
- Top 4: 1.77 million commits
- Top 5: 1.06 million commits
We were able to see from gitlabsos
log from the customer that reported this issue that the git log
command on one of their Gitaly nodes was definitely taking resources and time:
USER PID %CPU %MEM VSZ RSS STAT STARTED TIME WCHAN COMMAND
git 675210 96.0 3.3 4584232 2079824 R 14:02:53 00:00:16 - /var/opt/gitlab/gitaly/run/gitaly-2440486/git-exec-2448979987.d/git --git-dir /var/local/gitaly-cluster/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git -c gc.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c commitGraph.generationVersion=1 -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync log --pretty=%H --grep=<<redacted>> --regexp-ignore-case --max-count=40 --end-of-options <<redacted>>