Migration to Gitaly Cluster through API fails

In GitLab 14.5, the migration to Gitaly Cluster according to the documentation is not successful.
https://docs.gitlab.com/ee/administration/gitaly/index.html#migrating-to-gitaly-cluster

Initially, a single gitaly node was deployed. After deploying 3 gitaly cluster nodes, I ran the migration through the project_repository_storage_moves API. However, a large amount of error logs as shown below were output.

ex)

time="2021-12-14T07:45:42.687Z" level=error msg="finished streaming call with code Internal" correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="rpc error: code = Internal desc = voting on locked file: preimage vote: transaction was aborted" grpc.code=Internal grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=bidi_stream grpc.method=ReplicateRepository grpc.request.deadline="2021-12-14T13:45:42.158" grpc.request.fullMethod=/gitaly.RepositoryService/ReplicateRepository grpc.service=gitaly.RepositoryService grpc.start_time="2021-12-14T07:45:42.158" grpc.time_ms=528.446 peer.address="xx.xx.xx.xx:44668" pid=17 relative_path=@hashed/5a/48/5a48eed290f62c93553855c36c964e1ef16603d23dcce371a1b2ce9a3857d0e1.git remote_ip=xx.xx.xx.xx sentry.skip="{}" span.kind=server system=grpc username=xxxxxxxxx virtual_storage=default-praefect
time="2021-12-14T07:45:48.851Z" level=error msg="VoteTransaction: failure" component=transactions.Manager correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="node already cast a vote: \"gitlab-gitaly-default-praefect-0\"" grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=unary grpc.method=VoteTransaction grpc.request.deadline="2021-12-14T07:50:48.850" grpc.request.fullMethod=/gitaly.RefTransaction/VoteTransaction grpc.request.repo="<nil>" grpc.service=gitaly.RefTransaction grpc.start_time="2021-12-14T07:45:48.851" peer.address="xx.xx.xx.xx:8075" pid=17 remote_ip=10.244.3.40 span.kind=server system=grpc transaction.hash=11289439b2f75957fa163559baa7d3aec83601ef transaction.id=1302 transaction.voter=gitlab-gitaly-default-praefect-0 username=xx.xx.xx.xx
time="2021-12-14T07:45:48.851Z" level=error msg="finished unary call with code Internal" correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="node already cast a vote: \"gitlab-gitaly-default-praefect-0\"" grpc.code=Internal grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=unary grpc.method=VoteTransaction grpc.request.deadline="2021-12-14T07:50:48.850" grpc.request.fullMethod=/gitaly.RefTransaction/VoteTransaction grpc.request.repo="<nil>" grpc.service=gitaly.RefTransaction grpc.start_time="2021-12-14T07:45:48.851" grpc.time_ms=0.282 peer.address="xx.xx.xx.xx:8075" pid=17 remote_ip=xx.xx.xx.xx span.kind=server system=grpc username=xxxxxxxxx

Does project_repository_storage_moves work properly?

Workaround

  1. Make sure the repository does not exist in the gitaly nodes or on praefect DB
  2. Shut down all but 1 gitaly server
  3. Perform the repository move again.
  4. Once the move is completed, bring up the rest of the gitaly servers, replication will kick in to sync all gitaly servers.
Edited by Gerardo Gutierrez