GitHub import process is extremely slow
Overview
GitHub importing is currently extremely slow for large repositories. v8.16.4
Reproduce
- Import https://github.com/gitlabhq/gitlabhq
- 45,568 commits
- x pull requests
- Running the import takes well over 30 minutes (this does vary based on connection speed and processing power)
There's no progress shown to the user during this time.
Debugging
I've monkey patched the client.rb to print the last api response including the status code. It's clear we're not hitting the GitHub rate limit. The time between sending requests is slow. It seems the bottleneck may be when we create the merge requests/issues on the GitLab side.
After cycle analytics was introduced we see a large number of updates to the merge_request_metrics
table when importing.
SQL (0.4ms) UPDATE "merge_request_metrics" SET "merged_at" = $1, "updated_at" = $2
WHERE "merge_request_metrics"."id" = $3 [["merged_at", "2017-02-14 17:28:58.179914"],
["updated_at", "2017-02-14 17:28:58.180620"], ["id", 564]]