Speed up pull request imports in GitHub importer
A number of customers have attempted to import large repositories (e.g. > 1 GB) with 20,000 pull requests over NFS. We've seen this take more than 24 hours, with most of the time spent importing pull requests.
We can benchmark the import of our githubhq/githubhq project since it has 3000 merge requests. This takes about 1-2 hours on a local machine.
With https://gitlab.com/gitlab-com/support/toolbox/json_stats, a sample of Gitaly log looks like while the import is running looks like:
$ sh gitaly_stats.sh ~/Documents/gitaly.txt
Gitaly Method Stats
METHOD COUNT PERC99 PERC95 STDDEV MAX MIN
RefExists 107 15.28 11.87 2.90 18.09 1.86
FindCommit 69 23.77 13.98 3.69 23.77 3.00
FindMergeBase 65 61.48 13.03 10.80 61.48 2.60
WriteRef 57 15.54 7.66 2.11 15.54 2.53
CommitDiff 33 288.46 33.72 48.40 288.46 3.59
CommitsBetween 32 29.81 27.55 6.89 29.81 4.15
CalculateChecksum 2 308.37 308.37 151.74 308.37 4.88
RepositoryExists 2 0.12 0.12 0.00 0.12 0.12
Check 1 0.07 0.07 0.00 0.07 0.07
Edited by Stan Hu