Runner clone url not used in Git LFS Batch API response objects
Summary
A clone_url is specified for a Runner when the public URL for the GitLab instance is not available to the runner. The Runner uses this URL to do a git clone of the repository on the GitLab instance. If that repository is using Git LFS, the objects that are returned to the runner through the Git LFS Batch API have the public URL for the GitLab instance. As a result, the runner can't fetch these resources as it does not have access to the public URL.
For example, here is sanitized output for GIT_TRACE=1 git clone from a customer reporting this issue:
16:02:36.652404 trace git-lfs: HTTP: POST http://gitlab-ci-token:XXXXXXXXXXX@main-gitlab.local.:8005/group/project.git/info/lfs/objects/batch
16:02:36.839941 trace git-lfs: HTTP: 200
16:02:36.840969 trace git-lfs: HTTP: {"objects":[{"oid":"12345","size":5037227,"actions":{"download":{"href":"https://gitlab.ci/group/project.git/gitlab-lfs/objects/12345","header":{"Authorization":"Basic XXXXXXX"}}}}
Steps to reproduce
Customer reporting this bug --> https://gitlab.zendesk.com/agent/tickets/106888 (internal use only)
From the customer:
"All is needed to see a problem is to run and configure gitlab runner to use private Gitlab URL ("main-gitlab.local.:8005" in both "clone_url" and "url" runner config), run in it so that it has no access to public gitlab endpoint (use strict network policies) and then try to run any job for a project, with Git LFS enabled.
What happens is that it creates executor pod, helper container in that pod is correctly using clone_url to do git clone, then git-lfs kicks in (via git smudge filter set in .gitattributes files: https://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes#filters_a ) , which derives Git LFS API endpoint from remotes.origin.url git option , and then follows Git LFS protocol using it (https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md), more specifically Batch API, where it expects JSON response with URLs to objects, these URLs continue to use publically available URLs for object downloads."
What is the current bug behavior?
If there is a clone_url specified for a Runner, and the repository being cloned has Git LFS, the Git LFS objects returned during a git clone have the public URL in their links, rather than the clone_url. This prevents the Runner from fetching these resources as the Runner does not have access to the public URL.
What is the expected correct behavior?
If a clone_url is specified for a Runner, that should be used for any Git LFS object links returned via the Batch API.
Relevant logs and/or screenshots
(Paste any relevant logs - please use code blocks (```) to format console output, logs, and code as it's very hard to read otherwise.)
Output of checks
(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.7p456 Gem Version: 2.6.14 Bundler Version:1.13.7 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.16.4 Sidekiq Version:5.0.5 Go Version: unknown
GitLab information Version: 10.8.5-ee Revision: 8f03e3e Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.9 URL: https://gitlab.ci.i.nakhoda.ai HTTP Clone URL: https://gitlab.ci.i.nakhoda.ai/some-group/some-project.git SSH Clone URL: git@gitlab.ci.i.nakhoda.ai:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: no
GitLab Shell Version: 7.1.2 Repository storage paths:
- default: /gitlab-data/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab Shell ...
GitLab Shell version >= 7.1.2 ? ... OK (7.1.2) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 17/2 ... ok 17/3 ... ok 17/4 ... ok 17/5 ... ok 17/6 ... ok 4/7 ... ok 4/8 ... ok 4/9 ... ok 4/12 ... ok 4/15 ... ok 14/16 ... ok 14/18 ... ok 18/19 ... ok 18/20 ... ok 4/21 ... ok 17/22 ... ok 4/24 ... ok 4/25 ... ok 17/26 ... ok 39/27 ... ok 5/28 ... ok 4/29 ... ok 21/30 ... ok 5/31 ... ok 21/32 ... ok 17/33 ... ok 17/34 ... ok 21/35 ... ok 17/37 ... ok 21/38 ... ok 17/39 ... ok 18/40 ... ok 26/42 ... ok 26/43 ... ok 4/44 ... ok 17/45 ... ok 17/46 ... ok 28/47 ... ok 28/48 ... ok 17/49 ... ok 28/50 ... ok 17/51 ... ok 17/52 ... ok 17/53 ... ok 22/54 ... repository is empty 4/56 ... ok 4/57 ... ok 17/58 ... ok 35/59 ... ok 35/62 ... ok 5/63 ... repository is empty 35/64 ... ok 35/65 ... ok 39/66 ... ok 35/68 ... ok 14/69 ... ok 14/70 ... repository is empty 4/71 ... ok 35/73 ... ok 17/74 ... ok 17/75 ... repository is empty 39/76 ... ok 26/77 ... ok 26/78 ... ok 4/79 ... ok 17/80 ... ok 4/81 ... ok 17/82 ... ok 4/83 ... ok 39/86 ... ok 38/87 ... ok 17/88 ... ok 4/89 ... ok 4/90 ... repository is empty 17/91 ... ok 17/92 ... ok 17/95 ... ok 17/96 ... ok 4/97 ... ok 4/99 ... ok 17/100 ... ok 4/101 ... ok 14/102 ... ok 39/103 ... ok 4/104 ... ok 4/105 ... ok 14/106 ... ok 4/107 ... ok 38/108 ... ok 4/109 ... ok 4/110 ... ok 14/111 ... repository is empty 16/112 ... ok 4/113 ... ok 17/114 ... ok 39/115 ... ok 17/116 ... ok 4/117 ... ok 4/118 ... ok 14/119 ... repository is empty 4/120 ... ok 4/122 ... ok 38/124 ... ok 2/126 ... ok 17/127 ... ok 17/128 ... ok 4/129 ... ok 17/130 ... ok 17/131 ... ok 4/132 ... ok 2/133 ... ok 26/134 ... ok 21/135 ... ok Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /gitlab-data/ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 17/2 ... yes 17/3 ... yes 17/4 ... yes 17/5 ... yes 17/6 ... yes 4/7 ... yes 4/8 ... yes 4/9 ... yes 4/12 ... yes 4/15 ... yes 14/16 ... yes 14/18 ... yes 18/19 ... yes 18/20 ... yes 4/21 ... yes 17/22 ... yes 4/24 ... yes 4/25 ... yes 17/26 ... yes 39/27 ... yes 5/28 ... yes 4/29 ... yes 21/30 ... yes 5/31 ... yes 21/32 ... yes 17/33 ... yes 17/34 ... yes 21/35 ... yes 17/37 ... yes 21/38 ... yes 17/39 ... yes 18/40 ... yes 26/42 ... yes 26/43 ... yes 4/44 ... yes 17/45 ... yes 17/46 ... yes 28/47 ... yes 28/48 ... yes 17/49 ... yes 28/50 ... yes 17/51 ... yes 17/52 ... yes 17/53 ... yes 22/54 ... yes 4/56 ... yes 4/57 ... yes 17/58 ... yes 35/59 ... yes 35/62 ... yes 5/63 ... yes 35/64 ... yes 35/65 ... yes 39/66 ... yes 35/68 ... yes 14/69 ... yes 14/70 ... yes 4/71 ... yes 35/73 ... yes 17/74 ... yes 17/75 ... yes 39/76 ... yes 26/77 ... yes 26/78 ... yes 4/79 ... yes 17/80 ... yes 4/81 ... yes 17/82 ... yes 4/83 ... yes 39/86 ... yes 38/87 ... yes 17/88 ... yes 4/89 ... yes 4/90 ... yes 17/91 ... yes 17/92 ... yes 17/95 ... yes 17/96 ... yes 4/97 ... yes 4/99 ... yes 17/100 ... yes 4/101 ... yes 14/102 ... yes 39/103 ... yes 4/104 ... yes 4/105 ... yes 14/106 ... yes 4/107 ... yes 38/108 ... yes 4/109 ... yes 4/110 ... yes 14/111 ... yes 16/112 ... yes 4/113 ... yes 17/114 ... yes 39/115 ... yes 17/116 ... yes 4/117 ... yes 4/118 ... yes 14/119 ... yes 4/120 ... yes 4/122 ... yes 38/124 ... yes 2/126 ... yes 17/127 ... yes 17/128 ... yes 4/129 ... yes 17/130 ... yes 17/131 ... yes 4/132 ... yes 2/133 ... yes 26/134 ... yes 21/135 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.3.7) Git version >= 2.9.5 ? ... yes (2.16.4) Git user has default SSH configuration? ... yes Active users: ... 25 Elasticsearch version 5.1 - 5.5? ... skipped (elasticsearch is disabled)
Checking GitLab ... Finished
(we will only investigate if the tests are passing)
Possible fixes
The Projects::LfsApiController sets the link for the file to be downloaded via this line. Would it be possible to determine that the clone is using a different URL and use that URL instead of project.http_url_to_repo when constructing the link for the LFS object?