Fix go middleware so it doesn't respond with erroneous repo URLs
-
Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA
Related issues: #467850, #36354 (closed)
For easier review I have separated out the commits into
- The simplest fix which changes the least number of lines possible
- A cleanup of confusing logic to make the behaviour understandable
The problem
The go
toolchain discovers a repository URL for a Go package path by sending GET
requests to the most likely candidate URLs.
For example, if running go get gitlab.com/my-org/my-group/my-private-repo/my-path
, the go toolchain will send GET requests to (in priority order)
https://gitlab.com/my-org/my-group?go-get=1
https://gitlab.com/my-org/my-group/my-private-repo/my-path?go-get=1
https://gitlab.com/my-org/my-group/my-private-repo?go-get=1
https://gitlab.com/my-org?go-get=1
The current behaviour of GitLab is to return a positive response to the first request https://gitlab.com/my-org/my-group?go-get=1
, even though a repo doesn't exist at this location, and has never existed.
This can be observed by running
$ curl -n "https://gitlab.com/my-org/my-group/my-private-repo/my-path?go-get=1"
<html><head><meta name="go-import" content="gitlab.com/my-org/my-group git https://gitlab.com/my-org/my-group.git"><meta name="go-source" content="gitlab.com/my-org/my-group https://gitlab.com/my-org/my-group https://gitlab.com/my-org/my-group/-/tree/master{/dir} https://gitlab.com/my-org/my-group/-/blob/master{/dir}/{file}#L{line}"></head><body>go get https://gitlab.com/my-org/my-group</body></html>
As you can see, whether authenticated or not, Gitlab returns an erroneous repo URL of https://gitlab.com/my-org/my-group.git
.
The problem is that this erroneous repo URL causes the go toolchain to respond to the user with error messages that indicate a problem with Gitlab. If there is ever a problem with authentication or a wrong package path or a branch name, the go toolchain will fall back to the most probable repo pattern to fetch (e.g. gitlab.com/my-org/my-group), and because it doesn't exist will report an issue back to the user. The issue isn't necessarily with Gitlab, but it looks like it is because Gitlab has supplied an erroneous URL that go attempts to use.
A misunderstanding in the current implementation can be found in the docs where it says
It happens, because go get makes an unauthenticated request to discover the repository path
This is incorrect. The go toolchain does indeed send authenticated requests when credentials exist in the .netrc
file.
The solution
This PR corrects the behaviour of the go middleware to only respond with a positive result if
- The repository exists
- The request is appropriately authenticated to read the project
If the above conditions are not true, the middleware now "fails fast" and returns a 404 response with an error message that can be displayed by the go toolchain. Documentation links have been included in this response to mitigate any confusion.
This approach is safe. It does not reveal the existence of a private nested project, because the 404 is consistent whether it exists or auth fails.
This approach is correct. It provides the go toolchain with the information that it is looking for: that it cannot access the repo.
And it is also a better outcome for both the user and for GitLab, because now the go toolchain does not report a git problem with a non-existent repo on GitLab. This allows the user to track down the real issue faster, and stops them thinking that this is a problem with GitLab.
Potential impact to users
This change should not affect
- public repos, because these repos are always readable
- users correctly authenticating Gitlab HTTP requests via their
.netrc
file as instructed in the docs
This change may affect users where all of the following conditions are true
- The repo is private
- AND they are using the repo as a go package
- AND the project uses a "simple" org/project URL with no subgroup
- AND the user has not properly added their creds to
.netrc
as instructed in the docs
For these users, the solution is straightforward and is communicated in the error message and the documentation: add your Gitlab Personal Access Token to the .netrc
file so that both go
and git
can correctly authenticate when using HTTP.
Why we should make this change
Ultimately the biggest problem with the current approach is that the outcome comes down to luck. If your project path fits the path assumption, then you can get away with not configuring your auth correctly (for now anyway, you're gonna need it at some point). But if your project path doesn't fit the assumption, you're going to have a bad time.
I believe Gitlab should respond with "correctness" being the goal, not with a heuristic or a dice roll.