Migration to Gitaly Cluster through API fails
In GitLab 14.5, the migration to Gitaly Cluster according to the documentation is not successful.
https://docs.gitlab.com/ee/administration/gitaly/index.html#migrating-to-gitaly-cluster
Initially, a single gitaly node was deployed. After deploying 3 gitaly cluster nodes, I ran the migration through the project_repository_storage_moves API. However, a large amount of error logs as shown below were output.
ex)
time="2021-12-14T07:45:42.687Z" level=error msg="finished streaming call with code Internal" correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="rpc error: code = Internal desc = voting on locked file: preimage vote: transaction was aborted" grpc.code=Internal grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=bidi_stream grpc.method=ReplicateRepository grpc.request.deadline="2021-12-14T13:45:42.158" grpc.request.fullMethod=/gitaly.RepositoryService/ReplicateRepository grpc.service=gitaly.RepositoryService grpc.start_time="2021-12-14T07:45:42.158" grpc.time_ms=528.446 peer.address="xx.xx.xx.xx:44668" pid=17 relative_path=@hashed/5a/48/5a48eed290f62c93553855c36c964e1ef16603d23dcce371a1b2ce9a3857d0e1.git remote_ip=xx.xx.xx.xx sentry.skip="{}" span.kind=server system=grpc username=xxxxxxxxx virtual_storage=default-praefect
time="2021-12-14T07:45:48.851Z" level=error msg="VoteTransaction: failure" component=transactions.Manager correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="node already cast a vote: \"gitlab-gitaly-default-praefect-0\"" grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=unary grpc.method=VoteTransaction grpc.request.deadline="2021-12-14T07:50:48.850" grpc.request.fullMethod=/gitaly.RefTransaction/VoteTransaction grpc.request.repo="<nil>" grpc.service=gitaly.RefTransaction grpc.start_time="2021-12-14T07:45:48.851" peer.address="xx.xx.xx.xx:8075" pid=17 remote_ip=10.244.3.40 span.kind=server system=grpc transaction.hash=11289439b2f75957fa163559baa7d3aec83601ef transaction.id=1302 transaction.voter=gitlab-gitaly-default-praefect-0 username=xx.xx.xx.xx
time="2021-12-14T07:45:48.851Z" level=error msg="finished unary call with code Internal" correlation_id=01FPVRH0ANTTYJH1VJHSDTGTCV error="node already cast a vote: \"gitlab-gitaly-default-praefect-0\"" grpc.code=Internal grpc.meta.auth_version=v2 grpc.meta.client_name=gitlab-sidekiq grpc.meta.deadline_type=unknown grpc.meta.method_type=unary grpc.method=VoteTransaction grpc.request.deadline="2021-12-14T07:50:48.850" grpc.request.fullMethod=/gitaly.RefTransaction/VoteTransaction grpc.request.repo="<nil>" grpc.service=gitaly.RefTransaction grpc.start_time="2021-12-14T07:45:48.851" grpc.time_ms=0.282 peer.address="xx.xx.xx.xx:8075" pid=17 remote_ip=xx.xx.xx.xx span.kind=server system=grpc username=xxxxxxxxx
Does project_repository_storage_moves work properly?
Workaround
- Make sure the repository does not exist in the gitaly nodes or on praefect DB
- Shut down all but 1 gitaly server
- Perform the repository move again.
- Once the move is completed, bring up the rest of the gitaly servers, replication will kick in to sync all gitaly servers.
Edited by Gerardo Gutierrez