Skip to content

Geo LFS redirect from secondary to primary may not work mid-session

This is a summary of what we believe may be happening in https://gitlab.zendesk.com/agent/tickets/318550:

  1. User pulls against secondary over SSH.
  2. Secondary Git data is transferred
  3. User's Git-LFS client requests multiple LFS objects
  4. Some LFS objects get transferred
  5. Secondary notices its project repo has been updated and not yet synced (https://gitlab.com/gitlab-org/gitlab/-/blob/07e830e68d9ed6faf10c7579c925b0f5d261f083/ee/app/controllers/ee/repositories/git_http_client_controller.rb#L192)
  6. The next LFS object fails to transfer, which fails because the LFS client can't find the credentials for the secondary.

In the SSH case, the client executes a git lfs-authenticate call, which is handled by gitlab-shell. gitlab-shell makes an internal API call to generate an LfsToken: https://gitlab.com/gitlab-org/gitlab/-/blob/5555bb77d5c5fa4069df36f1e47403afe9b45190/lib/api/internal/base.rb#L188-190

In the HTTPS case, the LfsToken is generated here: https://gitlab.com/gitlab-org/gitlab/-/blob/bc38125ebbfe5008240a01d098d9a419bf72ac38/app/controllers/repositories/lfs_api_controller.rb#L147

The SSH case, we see errors such as:

[2022-08-22T05:36:03.096Z] 05:36:03.051717 trace git-lfs: creds: git credential fill ("https", "gitlab.primary.example.org", "")
[2022-08-22T05:36:03.096Z] 05:36:03.053318 git.c:439               trace: built-in: git credential fill
[2022-08-22T05:36:03.096Z] fatal: could not read Username for 'gitlab.primary.example.org': No such device or address
[2022-08-22T05:36:03.096Z] 05:36:03.053622 trace git-lfs: api error: Git credentials for https://gitlab.primary.example.org/-/push_from_secondary/5/group/project.git/info/lfs/objects/batch not found.

I suspect git-lfs uses git credential in https://github.com/git-lfs/git-lfs/blob/46801d3b4efa878ccc9098cb3e49eb0e72fe5597/creds/creds.go#L308 to associate the LfsToken with the LFS server.

Since the credentials are only available for the secondary, the primary credentials are empty.

Proposal

We don't redirect LFS download requests:

diff --git a/ee/app/controllers/ee/repositories/git_http_client_controller.rb b/ee/app/controllers/ee/repositories/git_http_client_controller.rb
index 1912182a7023..0897f09c7ae5 100644
--- a/ee/app/controllers/ee/repositories/git_http_client_controller.rb
+++ b/ee/app/controllers/ee/repositories/git_http_client_controller.rb
@@ -189,7 +189,7 @@ def transfer_download?
         def out_of_date_redirect?
           return false unless project
 
-          (batch_download? || transfer_download?) && ::Geo::ProjectRegistry.repository_out_of_date?(project.id)
+          batch_download? && ::Geo::ProjectRegistry.repository_out_of_date?(project.id)
         end
 
         def wanted_version

Workaround

In general, a retry of the Git pull seems likely to succeed.

If this affects many GitLab CI builds, then for example you might be able to set GET_SOURCES_ATTEMPTS to 3: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#job-stages-attempts

Edited by Michael Kozono