Skip to content

Fix go middleware so it doesn't respond with erroneous repo URLs

Michael Tibben requested to merge mtibben/gitlab:fix-go-get-response into master
  • Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA

Related issues: #467850, #36354 (closed)

For easier review I have separated out the commits into

  1. The simplest fix which changes the least number of lines possible
  2. A cleanup of confusing logic to make the behaviour understandable

The problem

The go toolchain discovers a repository URL for a Go package path by sending GET requests to the most likely candidate URLs.

For example, if running go get gitlab.com/my-org/my-group/my-private-repo/my-path, the go toolchain will send GET requests to (in priority order)

  • https://gitlab.com/my-org/my-group?go-get=1
  • https://gitlab.com/my-org/my-group/my-private-repo/my-path?go-get=1
  • https://gitlab.com/my-org/my-group/my-private-repo?go-get=1
  • https://gitlab.com/my-org?go-get=1

The current behaviour of GitLab is to return a positive response to the first request https://gitlab.com/my-org/my-group?go-get=1, even though a repo doesn't exist at this location, and has never existed.

This can be observed by running

$ curl -n "https://gitlab.com/my-org/my-group/my-private-repo/my-path?go-get=1"
<html><head><meta name="go-import" content="gitlab.com/my-org/my-group git https://gitlab.com/my-org/my-group.git"><meta name="go-source" content="gitlab.com/my-org/my-group https://gitlab.com/my-org/my-group https://gitlab.com/my-org/my-group/-/tree/master{/dir} https://gitlab.com/my-org/my-group/-/blob/master{/dir}/{file}#L{line}"></head><body>go get https://gitlab.com/my-org/my-group</body></html>

As you can see, whether authenticated or not, Gitlab returns an erroneous repo URL of https://gitlab.com/my-org/my-group.git.

The problem is that this erroneous repo URL causes the go toolchain to respond to the user with error messages that indicate a problem with Gitlab. If there is ever a problem with authentication or a wrong package path or a branch name, the go toolchain will fall back to the most probable repo pattern to fetch (e.g. gitlab.com/my-org/my-group), and because it doesn't exist will report an issue back to the user. The issue isn't necessarily with Gitlab, but it looks like it is because Gitlab has supplied an erroneous URL that go attempts to use.

A misunderstanding in the current implementation can be found in the docs where it says

It happens, because go get makes an unauthenticated request to discover the repository path

This is incorrect. The go toolchain does indeed send authenticated requests when credentials exist in the .netrc file.

The solution

This PR corrects the behaviour of the go middleware to only respond with a positive result if

  1. The repository exists
  2. The request is appropriately authenticated to read the project

If the above conditions are not true, the middleware now "fails fast" and returns a 404 response with an error message that can be displayed by the go toolchain. Documentation links have been included in this response to mitigate any confusion.

This approach is safe. It does not reveal the existence of a private nested project, because the 404 is consistent whether it exists or auth fails.

This approach is correct. It provides the go toolchain with the information that it is looking for: that it cannot access the repo.

And it is also a better outcome for both the user and for GitLab, because now the go toolchain does not report a git problem with a non-existent repo on GitLab. This allows the user to track down the real issue faster, and stops them thinking that this is a problem with GitLab.

Potential impact to users

This change should not affect

  • public repos, because these repos are always readable
  • users correctly authenticating Gitlab HTTP requests via their .netrc file as instructed in the docs

This change may affect users where all of the following conditions are true

  • The repo is private
  • AND they are using the repo as a go package
  • AND the project uses a "simple" org/project URL with no subgroup
  • AND the user has not properly added their creds to .netrc as instructed in the docs

For these users, the solution is straightforward and is communicated in the error message and the documentation: add your Gitlab Personal Access Token to the .netrc file so that both go and git can correctly authenticate when using HTTP.

Why we should make this change

Ultimately the biggest problem with the current approach is that the outcome comes down to luck. If your project path fits the path assumption, then you can get away with not configuring your auth correctly (for now anyway, you're gonna need it at some point). But if your project path doesn't fit the assumption, you're going to have a bad time.

I believe Gitlab should respond with "correctness" being the goal, not with a heuristic or a dice roll.

Edited by Michael Tibben

Merge request reports