CommitsByMessage takes a long time for large repos

Summary

A customer has reached out to Support recently to report very slow grpc requests relating to viewing commits. It seems that CommitsByMessage, which is initiated when a search parameter is provided to the Projects::CommitsController, performs a git log with grep and regexp-ignore-case options.

In the customer's case, these grpc requests are taking a minimum of 20 seconds, with times as high as 27s in the logs we've analysed. The repo in question is 435GB with over 2 million commits, 14k branches, 9k tags.

Impact

I've noticed while searching in Elasticsearch that this is also affecting larger repos on GitLab.com. I've redacted the project names in the screenshot below, which shows top projects by CommitsByMessage grpc time in the last 24 hours: elasticsearch_commitsbymessage_median_grpc_time_by_top_projects

  • Top 1: 1.15 million commits
  • Top 2: 1.19 million commits
  • Top 3: 1.07 million commits
  • Top 4: 1.77 million commits
  • Top 5: 1.06 million commits

We were able to see from gitlabsos log from the customer that reported this issue that the git log command on one of their Gitaly nodes was definitely taking resources and time:

USER         PID %CPU %MEM    VSZ   RSS STAT  STARTED     TIME WCHAN                    COMMAND
git       675210 96.0  3.3 4584232 2079824 R 14:02:53 00:00:16 -                        /var/opt/gitlab/gitaly/run/gitaly-2440486/git-exec-2448979987.d/git --git-dir /var/local/gitaly-cluster/repositories/@hashed/xx/xx/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.git -c gc.auto=0 -c core.autocrlf=input -c core.useReplaceRefs=false -c commitGraph.generationVersion=1 -c core.fsync=objects,derived-metadata,reference -c core.fsyncMethod=fsync log --pretty=%H --grep=<<redacted>> --regexp-ignore-case --max-count=40 --end-of-options <<redacted>>