Geo: Use `git clone` for first sync instead of `git fetch`
Description
Git clone is at least 2 orders of magnitude faster when importing big repository with lots of keep-around refs. See https://gitlab.com/gitlab-org/gitlab-ee/issues/5181
Initial sync in current Geo does the following:
- creates a new empty bare repository
- define custom config data
- add primary as as remote (
geo
) /opt/gitlab/embedded/bin/git fetch geo --quiet --prune --force --tags
This has 2 side-effects. By not using clone, every and each ref is stored unpacked first. Then fetching missing objects takes much much longer as some operations requires checking files on disk, and it looks like we may have a algorithm with polynomial complexity when refs aren't packed.
Proposal
Initial Geo sync will have to use git clone --mirror. We need to investigate if we can still use this with clone:
# Fetch the repository, using a JWT header for authentication
authorization = ::Gitlab::Geo::RepoSyncRequest.new.authorization
header = { "http.#{url}.extraHeader" => "Authorization: #{authorization}" }
Also it's been a long time ago since I wrote first synchronization code, we need to review the whole repository is empty/not empty statemachine and how it behaves with Geo cloning instead of creating a new repo first.
The goal here is to be able to import gitlab-ce in about 3 minutes using git clone.