Geo LFS redirect from secondary to primary may not work mid-session
This is a summary of what we believe may be happening in https://gitlab.zendesk.com/agent/tickets/318550:
- User pulls against secondary over SSH.
- Secondary Git data is transferred
- User's Git-LFS client requests multiple LFS objects
- Some LFS objects get transferred
- Secondary notices its project repo has been updated and not yet synced (https://gitlab.com/gitlab-org/gitlab/-/blob/07e830e68d9ed6faf10c7579c925b0f5d261f083/ee/app/controllers/ee/repositories/git_http_client_controller.rb#L192)
- The next LFS object fails to transfer, which fails because the LFS client can't find the credentials for the secondary.
In the SSH case, the client executes a git lfs-authenticate
call, which is handled by gitlab-shell
. gitlab-shell
makes an internal API call to generate an LfsToken
: https://gitlab.com/gitlab-org/gitlab/-/blob/5555bb77d5c5fa4069df36f1e47403afe9b45190/lib/api/internal/base.rb#L188-190
In the HTTPS case, the LfsToken
is generated here: https://gitlab.com/gitlab-org/gitlab/-/blob/bc38125ebbfe5008240a01d098d9a419bf72ac38/app/controllers/repositories/lfs_api_controller.rb#L147
The SSH case, we see errors such as:
[2022-08-22T05:36:03.096Z] 05:36:03.051717 trace git-lfs: creds: git credential fill ("https", "gitlab.primary.example.org", "")
[2022-08-22T05:36:03.096Z] 05:36:03.053318 git.c:439 trace: built-in: git credential fill
[2022-08-22T05:36:03.096Z] fatal: could not read Username for 'gitlab.primary.example.org': No such device or address
[2022-08-22T05:36:03.096Z] 05:36:03.053622 trace git-lfs: api error: Git credentials for https://gitlab.primary.example.org/-/push_from_secondary/5/group/project.git/info/lfs/objects/batch not found.
I suspect git-lfs
uses git credential
in https://github.com/git-lfs/git-lfs/blob/46801d3b4efa878ccc9098cb3e49eb0e72fe5597/creds/creds.go#L308 to associate the LfsToken
with the LFS server.
Since the credentials are only available for the secondary, the primary credentials are empty.
Proposal
We don't redirect LFS download requests:
diff --git a/ee/app/controllers/ee/repositories/git_http_client_controller.rb b/ee/app/controllers/ee/repositories/git_http_client_controller.rb
index 1912182a7023..0897f09c7ae5 100644
--- a/ee/app/controllers/ee/repositories/git_http_client_controller.rb
+++ b/ee/app/controllers/ee/repositories/git_http_client_controller.rb
@@ -189,7 +189,7 @@ def transfer_download?
def out_of_date_redirect?
return false unless project
- (batch_download? || transfer_download?) && ::Geo::ProjectRegistry.repository_out_of_date?(project.id)
+ batch_download? && ::Geo::ProjectRegistry.repository_out_of_date?(project.id)
end
def wanted_version
Workaround
In general, a retry of the Git pull seems likely to succeed.
If this affects many GitLab CI builds, then for example you might be able to set GET_SOURCES_ATTEMPTS
to 3
: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#job-stages-attempts