Skip to content

Workhorse send_url and send_dependency timeouts and status codes

Context

We're working on the very first version of the dependency proxy for packages. See #407460 (comment 1373731852) for all the details from the technical investigation.

At the core, the concept is right simple. GitLab will act with as a proxy. Basically, users can pull packages through it and GitLab will be a pull-through cache.

Package Manager clients (npm, maven, ...) <-> GitLab <-> External Package Registry

Because, GitLab is in the middle (aka proxy) of the package transport, we can leverage the GitLab Package registry to use it as a cache. In other words, before contacting the external package registry, we can check the local project registry to check if the package is already there. If that's the case, we can return it directly.

In this MR, we're going to focus on the cases where the requested file doesn't exist in the GitLab Package registry. In those cases, we will need to either:

  1. get the remote file and return it to the client or
  2. get the remote file and return it to the client while publishing it to the GitLab Package Registry.

Which one is used depends on the user permissions because obviously, not all users can write to the GitLab Package Registry.

In both cases, the heavy lifting (eg. getting the file and stream it back to the client and/or upload it to the Package Registry) is done by Workhorse. We use two functions/instructions that tell workhorse what to do (respectively):

  1. send_url. In this case, rails instructs workhorse to get a file from an url and return it back to the external client.
  2. send_dependency. Similar but this time around, the file is sent back to the client and uploaded to an endpoint in rails at the same time.

During our verifications of the Maven dependency proxy on staging, we noticed the following:

  • Workhorse uses a network timeout of 30secs. This means that if the remote server for example never answers, workhorse will wait up to 30 seconds, hit this timeout and return a generic 500 status code.
  • Workhorse will simply bubble up the network issues, for example if the connection is refused and a generic 500 status code is returned.

This is not great for the Maven dependency proxy because:

  • the dependency proxy will be used with remote servers that are set up by users. It would be quite easy to have those situations (network hiccups).
  • Maven clients will ping the dependency proxy for at least 4 files for each dependency in a project. Imagine that we have a project with 5 packages and we want all of them to be pulled through the dependency proxy. If we have the conditions to hit the timeout, the maven client will spend at least 5 * 4 * 30 seconds = 10 minutes. That's slightly a bit too large 😱
  • Maven clients will report back the status code they received if it is an error, in this case, it will report an Internal Server Errror (500). That's a bit confusing for users as they will not understand if this error is coming from GitLab or the remote registry.

This is Maven dependency proxy: shorten workhorse send+... (#428370 - closed). In this MR, we want to improve things for the dependency proxy for packages by:

  • using a shorter timeout: 10secs. This will be mainly used on the timeout to:
    • open/establish the connection.
    • read the response headers once the request is sent.
  • using a specific error status codes. This is a dependency proxy, as such, we wanted to have better error codes when we have issues with the remote server instead of the generic 500 Internal server error:
    • in case of timeouts, we will return 504 Gateway Timeout.
    • in case of any other error, we will return 502 Bad Gateway.

We thus need to update workhorse's send_url and send_dependency.

send_dependency is used by the upcoming dependency proxy for packages and also used by the existing dependency proxy for container images. The behavior described above is a good improvement for both cases as GitLab is used as a proxy in both cases. Thus, the above behavior will be implemented in workhorse directly (hardcoded).

For send_url, things are a bit more challenging as this function is used on file uploads, AI features and this upcoming dependency proxy for packages. We don't want to have an impact in those features. As such, in this case, we want to let the rails backend decide the timeouts and the status codes on a per case basis. This way, we can specify timeouts and status codes only for the dependency proxy for packages. All the other features will use the usual values.

🤔 What does this MR do and why?

  • Add support for timeout and status code options for the workhorse send_url.
    • If not set, default values are used.
  • Update the workhorse send_dependency to use short timeouts and more specific status codes.
  • Update the related workhorse tests.
  • Update the maven dependency proxy so that we:
    • Use timeouts of 10secs.
    • Use 504 Gateway Timeout and 502 Bad Gateway.
  • Update the related requests specs.
  • Update the related feature spec.

The Maven dependency proxy is behind a feature flag. At the time of this writing it is not enabled on gitlab.com. See #415218 (closed). The workhorse changes will thus not be used until the feature flag is turned on.

🦄 Screenshots or screen recordings

No UI changes.

How to set up and validate locally

The set up is a bit involved as we need an external registry where we can manipulate things to simulate network hiccups.

🔨 Test setup

AWS lightsail

For this, we're going to use an AWS lightsail server that will have a dummy ruby server that will serve a single file. We will then set up the dependency proxy in the local GitLab instance and pull the file through it.

Let's get started 💪

  1. Open the AWS ligthsail.
  2. Create an instance. Select the smallest one with Ubuntu.
  3. Once it is running, open the web terminal / shell / whatever that thing is called:
    • sudo apt update
    • sudo apt install ruby -y
    • mkdir -p srv/com/my/company/1.2.3
    • cd srv/com/my/company/1.2.3
    • echo "bananas!" > test.txt
    • cd ../../../..
    • ruby -run -e httpd . -p 8081
    1. On the instance "Networking" tab, add a rule: Allow TCP 8081.

Our dummy external registry is now ready.

🦊 Local GitLab

The dependency proxy for packages has a few requirements:

  1. have packages -> enabled set to true in gitlab.yml.
  2. have dependency_proxy -> enabled set to true in gitlab.yml.
  3. have the packages feature enabled in the project's settings. Settings -> General -> Visiblity, project features, permissions -> Package registry (checkbox enabled.)
  4. have a GitLab license. Premium or more.
  5. have the related feature flag turned on:
    Feature.enable(:packages_dependency_proxy_maven)

The 3 first points should be enabled by default.

Next, let's configure our local GitLab:

  1. Have a private project ready.
  2. Have a PAT ready (with scope api). You need two users: maintainer+ or more and a reporter.
  3. Let's setup the dependency proxy settings in rails console:
    Project.find(<project_id>).create_dependency_proxy_packages_setting!(enabled: true, maven_external_registry_url: 'http://<lightsail instance IP>:8081')

🐎 Workhorse

After pulling this MR, make sure that you re build and re start the workhorse process (from the root directory of the gitlab rails project):

$ cd workhorse
$ make
$ gdk restart gitlab-workhorse

1️⃣ Pulling a file with a reporter

Let's pull the file through the dependency proxy:

$ curl "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" 
bananas!

Behind the scenes, the dependency proxy will see that the file is not cached and the current user can't put in the GitLab package registry, thus, it will use the workhorse send_url function to send the file from the remote file to $ curl (eg. no caching).

Let's reset the conditions on the GitLab project to make sure that we don't have any files "cached" in the GitLab package registry. In the rails console:

Project.find(<project_id>).packages.destroy_all

Let's simulate network issues

  1. Shutdown the ruby dummy server in the instance. This is to simulate a connection refused error.
  2. $ curl again:
    $ curl -vvv "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" 
    *   Trying XX:8000...
    * Connected to gdk.test (XXX) port 8000 (#0)
    * Server auth using Basic with user 'XXX'
    > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1
    > Host: gdk.test:8000
    > Authorization: Basic XXX
    > User-Agent: curl/8.1.2
    > Accept: */*
    > 
    < HTTP/1.1 502 Bad Gateway
    < Cache-Control: no-cache
    < Content-Length: 0
    < Content-Security-Policy: default-src 'none'
    < Content-Type: text/plain; charset=utf-8
    < Vary: Origin
    < X-Accel-Buffering: no
    < X-Content-Type-Options: nosniff
    < X-Frame-Options: SAMEORIGIN
    < X-Gitlab-Meta: {"correlation_id":"01HFF2BRR9HCFD50VZP1PPF4C1","version":"1"}
    < X-Request-Id: 01HFF2BRR9HCFD50VZP1PPF4C1
    < X-Runtime: 0.083933
    < Date: Fri, 17 Nov 2023 16:38:26 GMT
    < 
    * Connection #0 to host gdk.test left intact

We can see that:

  • The response is almost instant. This is expected, we didn't hit the timeout limit. Instead, we hit a network issue in the send_url function.
  • The response status code is 502 Bad Gateway, which is the custom response status code.
  1. Remove the TCP 8081 rule from the Networking tab. This is to simulate a connection that will never reply, thus this will hit the timeout.
  2. $ curl again:
    $ curl -vvv "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" 
    *   Trying XXX:8000...
    * Connected to gdk.test (XXX) port 8000 (#0)
    * Server auth using Basic with user '<reporter_username>'
    > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1
    > Host: gdk.test:8000
    > Authorization: Basic XXX
    > User-Agent: curl/8.1.2
    > Accept: */*
    > 
    < HTTP/1.1 504 Gateway Timeout
    < Cache-Control: no-cache
    < Content-Length: 0
    < Content-Security-Policy: default-src 'none'
    < Content-Type: text/plain; charset=utf-8
    < Vary: Origin
    < X-Accel-Buffering: no
    < X-Content-Type-Options: nosniff
    < X-Frame-Options: SAMEORIGIN
    < X-Gitlab-Meta: {"correlation_id":"01HFF2Y1CJSKZTZ8YW04EDJ2FY","version":"1"}
    < X-Request-Id: 01HFF2Y1CJSKZTZ8YW04EDJ2FY
    < X-Runtime: 0.087793
    < Date: Fri, 17 Nov 2023 16:48:35 GMT
    < 
    * Connection #0 to host gdk.test left intact

We can see that:

  • The response comes back after ~10secs, which is the custom timeout.
  • The response status code is 504 Gateway Timeout, which is the custom response status code and gives a very nice indication of what is happening.

2️⃣ Pulling a file with a maintainer+

Let's pull the file through the dependency proxy:

$ curl "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" 
bananas!

Behind the scenes, the dependency proxy will see that the file is not cached and the current user can write in the GitLab package registry, thus, it will use the workhorse send_dependency function to send the file from the remote file to $ curl (eg. no caching) and publish the file to the GitLab package registry.

Let's reset the conditions on the GitLab project, basically remove the file from the package registry, in the rails console:

Project.find(<project_id>).packages.destroy_all

Let's simulate network issues.

  1. Shutdown the ruby dummy server in the instance. This is to simulate a connection refused error.
  2. $ curl again:
    $ curl -vvv "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" 
    *   Trying XXX:8000...
    * Connected to gdk.test (XXX) port 8000 (#0)
    * Server auth using Basic with user 'root'
    > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1
    > Host: gdk.test:8000
    > Authorization: Basic XXX
    > User-Agent: curl/8.1.2
    > Accept: */*
    > 
    < HTTP/1.1 502 Bad Gateway
    < Cache-Control: no-cache
    < Content-Length: 0
    < Content-Security-Policy: default-src 'none'
    < Content-Type: text/plain; charset=utf-8
    < Vary: Origin
    < X-Accel-Buffering: no
    < X-Content-Type-Options: nosniff
    < X-Frame-Options: SAMEORIGIN
    < X-Gitlab-Meta: {"correlation_id":"01HFF33NX8WP5EM5BNP0J035MT","version":"1"}
    < X-Request-Id: 01HFF33NX8WP5EM5BNP0J035MT
    < X-Runtime: 0.045048
    < Date: Fri, 17 Nov 2023 16:51:30 GMT
    < 
    * Connection #0 to host gdk.test left intact

We can see that:

  • The response is almost instant. This is expected, we didn't hit the timeout limit. Instead, we hit a network issue in the send_url function.
  • The response status code is 503 Service Unavailable, which is the custom response status code.
  1. Remove the TCP 8081 rule from the Networking tab. This is to simulate a connection that will never reply, thus this will hit the timeout.
  2. $ curl again:
    curl -vvv "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt"
    *   Trying XXX:8000...
    * Connected to gdk.test (XXX) port 8000 (#0)
    * Server auth using Basic with user 'root'
    > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1
    > Host: gdk.test:8000
    > Authorization: Basic XXX
    > User-Agent: curl/8.1.2
    > Accept: */*
    > 
    < HTTP/1.1 504 Gateway Timeout
    < Cache-Control: no-cache
    < Content-Length: 0
    < Content-Security-Policy: default-src 'none'
    < Content-Type: text/plain; charset=utf-8
    < Vary: Origin
    < X-Accel-Buffering: no
    < X-Content-Type-Options: nosniff
    < X-Frame-Options: SAMEORIGIN
    < X-Gitlab-Meta: {"correlation_id":"01HFF31G5W528CPVHFBK4DZERM","version":"1"}
    < X-Request-Id: 01HFF31G5W528CPVHFBK4DZERM
    < X-Runtime: 0.189770
    < Date: Fri, 17 Nov 2023 16:50:29 GMT
    < 
    * Connection #0 to host gdk.test left intact

We can see that:

  • The response comes back after ~10secs, which is the custom timeout.
  • The response status code is 504 Gateway Timeout, which is the custom response status code.

3️⃣ Pulling a large file

From !136172 (comment 1642879177). Basically, downloading a large file (which will take time) should not hit any timeout.

Downloading a large file is not something that is rare when dealing with packages. As such, this scenario should work as usual and not be broken.

Create a large file on the cloud server:

  1. cd srv/com/my/company/1.2.3
  2. dd if=/dev/zero of=test1G.txt bs=1 count=0 seek=1G
  3. cd ../../../..
  4. ruby -run -e httpd . -p 8081

Pull it through the dependency proxy:

$ curl -o large_file.txt "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test1G.txt" 
  • You will see that after 10secs, the download is not canceled.

🔮 Conclusions

We have a coherent and identical behavior in case of network errors when using send_url or send_dependency: we return a 503 Service Unavailable.

🛴 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports