Geo: Elevated error rate during Direct Transfer
Problem
Using Direct Transfer into a primary Geo site results in an unnecessarily elevated error rate during the transfer. These appear to be transient errors. This alerts Dedicated SREs unnecessarily.
Steps to reproduce
- Use Direct Transfer to import a group with projects
Errors on the secondary:
==> log/geo.log <==
{"severity":"ERROR","time":"2024-05-21T02:34:51.727Z","correlation_id":"e7d9962acd19ef5b7035fbe53ae46c0b","class":"Geo::FrameworkRepositorySyncService","gitlab_host":"gdk2.test","message":"Error syncing repository","error":"13:creating repository: cloning repository: exit status 128."}
==> log/application_json.log <==
{"severity":"ERROR","time":"2024-05-21T02:34:52.011Z","meta.feature_category":"geo_replication","correlation_id":"f5ff2915570c0aa9f5468a03f500ee02","meta.caller_id":"Geo::EventWorker","message":"Error updating root ref for repository testmkozono/for-gitlab-ce-issue-49015.wiki (26): 5:no remote HEAD found.."}
And in Gitaly logs:
2024-05-21_02:49:04.30942 praefect-gitaly-0 : {"command.count":1,"command.cpu_time_ms":25,"command.inblock":0,"command.majflt":202,"command.maxrss":8749056,"command.minflt":1646,"command.oublock":0,"command.real_time_ms":175,"command.spawn_token_fork_ms":2,"command.spawn_token_wait_ms":0,"command.system_time_ms":15,"command.user_time_ms":10,"component":"gitaly.UnaryServerInterceptor","correlation_id":"c8ac9af61aa27fc15b448153af1bbd7d","error":"creating repository: cloning repository: exit status 128","error_metadata":{"resolved_address":"","stderr":"remote: A repository for this project does not exist yet.\nfatal: repository 'https://gdk.test:3443/testmkozono/test-snippets.git/' not found\n"},"feature_flags":"gpg_signing mailmap_options use_resizable_semaphore_in_concurrency_limiter","grpc.code":"Internal","grpc.meta.auth_version":"v2","grpc.meta.client_name":"gitlab-sidekiq","grpc.meta.method_operation":"mutator","grpc.meta.method_scope":"repository","grpc.meta.method_type":"unary","grpc.method":"CreateRepositoryFromURL","grpc.request.deadline":"2024-05-20T22:49:04.119","grpc.request.fullMethod":"/gitaly.RepositoryService/CreateRepositoryFromURL","grpc.request.glProjectPath":"testmkozono/test-snippets","grpc.request.glRepository":"project-25","grpc.request.payload_bytes":460,"grpc.request.repoPath":"@cluster/repositories/23/97/190","grpc.request.repoStorage":"praefect-internal-0","grpc.response.payload_bytes":0,"grpc.service":"gitaly.RepositoryService","grpc.start_time":"2024-05-20T16:49:04.119","grpc.time_ms":189.609,"level":"error","msg":"finished unary call with code Internal","pid":24206,"span.kind":"server","system":"grpc","time":"2024-05-21T02:49:04.309Z"}
The primary site has some errors too:
==> log/geo.log <==
{"severity":"ERROR","time":"2024-05-21T02:35:06.093Z","correlation_id":"446a67b53cfacd256a2c51e4a9541644","class":"Geo::ProjectRepositoryReplicator","gitlab_host":"gdk.test","message":"Repository cannot be checksummed because it does not exist","error":"Repository does not exist","replicable_name":"project_repository","replicable_id":25}
{"severity":"ERROR","time":"2024-05-21T02:35:06.096Z","correlation_id":"446a67b53cfacd256a2c51e4a9541644","class":"Project","gitlab_host":"gdk.test","message":"Error during verification","error":"Repository does not exist"}
I noticed these errors also occur with projects that have an empty repo (e.g. projects used as issue trackers). These also fail verification because there is no repo (as opposed to having an empty repo):
=> #<Geo::ProjectState:0x000000014c53e150
id: 25,
verification_started_at: Tue, 21 May 2024 02:59:02.896763000 UTC +00:00,
verification_retry_at: Tue, 21 May 2024 03:23:41.957406000 UTC +00:00,
verified_at: Tue, 21 May 2024 02:59:02.957452000 UTC +00:00,
project_id: 25,
verification_state: 3,
verification_retry_count: 6,
verification_checksum: nil,
verification_failure: "Error during verification: Repository does not exist">
If you Create a new project without a repo, GitLab creates an empty repo. Gitaly treats it as a special case and returns a checksum of all zeroes:
=> #<Geo::ProjectState:0x0000000166ef65e8
id: 46,
verification_started_at: Tue, 21 May 2024 03:03:05.658383000 UTC +00:00,
verification_retry_at: nil,
verified_at: Tue, 21 May 2024 03:03:05.760700000 UTC +00:00,
project_id: 46,
verification_state: 2,
verification_retry_count: 0,
verification_checksum: "0000000000000000000000000000000000000000",
verification_failure: nil>
Edited by Michael Kozono