Workhorse send_url and send_dependency timeouts and status codes
☎ Context
We're working on the very first version of the dependency proxy for packages. See #407460 (comment 1373731852) for all the details from the technical investigation.
At the core, the concept is right simple. GitLab will act with as a proxy. Basically, users can pull packages through it and GitLab will be a pull-through cache.
Package Manager clients (npm, maven, ...) <-> GitLab <-> External Package Registry
Because, GitLab is in the middle (aka proxy) of the package transport, we can leverage the GitLab Package registry to use it as a cache. In other words, before contacting the external package registry, we can check the local project registry to check if the package is already there. If that's the case, we can return it directly.
In this MR, we're going to focus on the cases where the requested file doesn't exist in the GitLab Package registry. In those cases, we will need to either:
- get the remote file and return it to the client or
- get the remote file and return it to the client while publishing it to the GitLab Package Registry.
Which one is used depends on the user permissions because obviously, not all users can write to the GitLab Package Registry.
In both cases, the heavy lifting (eg. getting the file and stream it back to the client and/or upload it to the Package Registry) is done by Workhorse. We use two functions/instructions that tell workhorse what to do (respectively):
-
send_url
. In this case, rails instructs workhorse to get a file from an url and return it back to the external client. -
send_dependency
. Similar but this time around, the file is sent back to the client and uploaded to an endpoint in rails at the same time.
During our verifications of the Maven dependency proxy on staging, we noticed the following:
- Workhorse uses a network timeout of
30secs
. This means that if the remote server for example never answers, workhorse will wait up to30
seconds, hit this timeout and return a generic500
status code. - Workhorse will simply bubble up the network issues, for example if the connection is refused and a generic
500
status code is returned.
This is not great for the Maven dependency proxy because:
- the dependency proxy will be used with remote servers that are set up by users. It would be quite easy to have those situations (network hiccups).
- Maven clients will ping the dependency proxy for at least
4
files for each dependency in a project. Imagine that we have a project with5
packages and we want all of them to be pulled through the dependency proxy. If we have the conditions to hit the timeout, the maven client will spend at least5 * 4 * 30 seconds = 10 minutes
. That's slightly a bit too large😱 - Maven clients will report back the status code they received if it is an error, in this case, it will report an
Internal Server Errror
(500). That's a bit confusing for users as they will not understand if this error is coming from GitLab or the remote registry.
This is Maven dependency proxy: shorten workhorse send+... (#428370 - closed). In this MR, we want to improve things for the dependency proxy for packages by:
- using a shorter timeout: 10secs. This will be mainly used on the timeout to:
- open/establish the connection.
- read the response headers once the request is sent.
- using a specific error status codes. This is a dependency proxy, as such, we wanted to have better error codes when we have issues with the remote server instead of the generic
500 Internal server error
:- in case of timeouts, we will return
504 Gateway Timeout
. - in case of any other error, we will return
502 Bad Gateway
.
- in case of timeouts, we will return
We thus need to update workhorse's send_url
and send_dependency
.
send_dependency
is used by the upcoming dependency proxy for packages and also used by the existing dependency proxy for container images. The behavior described above is a good improvement for both cases as GitLab is used as a proxy in both cases. Thus, the above behavior will be implemented in workhorse directly (hardcoded).
For send_url
, things are a bit more challenging as this function is used on file uploads, AI features and this upcoming dependency proxy for packages. We don't want to have an impact in those features. As such, in this case, we want to let the rails backend decide the timeouts and the status codes on a per case basis. This way, we can specify timeouts and status codes only for the dependency proxy for packages. All the other features will use the usual values.
🤔 What does this MR do and why?
- Add support for timeout and status code options for the workhorse
send_url
.- If not set, default values are used.
- Update the workhorse
send_dependency
to use short timeouts and more specific status codes. - Update the related workhorse tests.
- Update the maven dependency proxy so that we:
- Use timeouts of
10secs
. - Use
504 Gateway Timeout
and502 Bad Gateway
.
- Use timeouts of
- Update the related requests specs.
- Update the related feature spec.
The Maven dependency proxy is behind a feature flag. At the time of this writing it is not enabled on gitlab.com. See #415218 (closed). The workhorse changes will thus not be used until the feature flag is turned on.
🦄 Screenshots or screen recordings
No UI changes.
⚙ How to set up and validate locally
The set up is a bit involved as we need an external registry where we can manipulate things to simulate network hiccups.
🔨 Test setup
⛵ AWS lightsail
For this, we're going to use an AWS lightsail server that will have a dummy ruby server that will serve a single file. We will then set up the dependency proxy in the local GitLab instance and pull the file through it.
Let's get started
- Open the AWS ligthsail.
- Create an instance. Select the smallest one with Ubuntu.
- Once it is running, open the web terminal / shell / whatever that thing is called:
sudo apt update
sudo apt install ruby -y
mkdir -p srv/com/my/company/1.2.3
cd srv/com/my/company/1.2.3
echo "bananas!" > test.txt
cd ../../../..
ruby -run -e httpd . -p 8081
- On the instance "Networking" tab, add a rule: Allow
TCP 8081
.
Our dummy external registry is now ready.
🦊 Local GitLab
The dependency proxy for packages has a few requirements:
- have
packages
->enabled
set totrue
ingitlab.yml
. - have
dependency_proxy
->enabled
set totrue
ingitlab.yml
. - have the
packages
feature enabled in the project's settings.Settings
->General
->Visiblity, project features, permissions
->Package registry
(checkbox enabled.) - have a GitLab license.
Premium
or more. - have the related feature flag turned on:
Feature.enable(:packages_dependency_proxy_maven)
The 3 first points should be enabled by default.
Next, let's configure our local GitLab:
- Have a private project ready.
- Have a PAT ready (with scope
api
). You need two users:maintainer
+ or more and areporter
. - Let's setup the dependency proxy settings in rails console:
Project.find(<project_id>).create_dependency_proxy_packages_setting!(enabled: true, maven_external_registry_url: 'http://<lightsail instance IP>:8081')
🐎 Workhorse
After pulling this MR, make sure that you re build and re start the workhorse process (from the root directory of the gitlab rails project):
$ cd workhorse
$ make
$ gdk restart gitlab-workhorse
1️⃣ Pulling a file with a reporter
Let's pull the file through the dependency proxy:
$ curl "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt"
bananas!
Behind the scenes, the dependency proxy will see that the file is not cached and the current user can't put in the GitLab package registry, thus, it will use the workhorse send_url
function to send the file from the remote file to $ curl
(eg. no caching).
Let's reset the conditions on the GitLab project to make sure that we don't have any files "cached" in the GitLab package registry. In the rails console:
Project.find(<project_id>).packages.destroy_all
Let's simulate network issues
- Shutdown the ruby dummy server in the instance. This is to simulate a connection refused error.
-
$ curl
again:$ curl -vvv "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" * Trying XX:8000... * Connected to gdk.test (XXX) port 8000 (#0) * Server auth using Basic with user 'XXX' > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1 > Host: gdk.test:8000 > Authorization: Basic XXX > User-Agent: curl/8.1.2 > Accept: */* > < HTTP/1.1 502 Bad Gateway < Cache-Control: no-cache < Content-Length: 0 < Content-Security-Policy: default-src 'none' < Content-Type: text/plain; charset=utf-8 < Vary: Origin < X-Accel-Buffering: no < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-Gitlab-Meta: {"correlation_id":"01HFF2BRR9HCFD50VZP1PPF4C1","version":"1"} < X-Request-Id: 01HFF2BRR9HCFD50VZP1PPF4C1 < X-Runtime: 0.083933 < Date: Fri, 17 Nov 2023 16:38:26 GMT < * Connection #0 to host gdk.test left intact
We can see that:
- The response is almost instant. This is expected, we didn't hit the timeout limit. Instead, we hit a network issue in the
send_url
function. - The response status code is
502 Bad Gateway
, which is the custom response status code.
- Remove the
TCP 8081
rule from theNetworking
tab. This is to simulate a connection that will never reply, thus this will hit the timeout. -
$ curl
again:$ curl -vvv "http://<reporter_username>:<reporter_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" * Trying XXX:8000... * Connected to gdk.test (XXX) port 8000 (#0) * Server auth using Basic with user '<reporter_username>' > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1 > Host: gdk.test:8000 > Authorization: Basic XXX > User-Agent: curl/8.1.2 > Accept: */* > < HTTP/1.1 504 Gateway Timeout < Cache-Control: no-cache < Content-Length: 0 < Content-Security-Policy: default-src 'none' < Content-Type: text/plain; charset=utf-8 < Vary: Origin < X-Accel-Buffering: no < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-Gitlab-Meta: {"correlation_id":"01HFF2Y1CJSKZTZ8YW04EDJ2FY","version":"1"} < X-Request-Id: 01HFF2Y1CJSKZTZ8YW04EDJ2FY < X-Runtime: 0.087793 < Date: Fri, 17 Nov 2023 16:48:35 GMT < * Connection #0 to host gdk.test left intact
We can see that:
- The response comes back after ~10secs, which is the custom timeout.
- The response status code is
504 Gateway Timeout
, which is the custom response status code and gives a very nice indication of what is happening.
2️⃣ Pulling a file with a maintainer
+
Let's pull the file through the dependency proxy:
$ curl "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt"
bananas!
Behind the scenes, the dependency proxy will see that the file is not cached and the current user can write in the GitLab package registry, thus, it will use the workhorse send_dependency
function to send the file from the remote file to $ curl
(eg. no caching) and publish the file to the GitLab package registry.
Let's reset the conditions on the GitLab project, basically remove the file from the package registry, in the rails console:
Project.find(<project_id>).packages.destroy_all
Let's simulate network issues.
- Shutdown the ruby dummy server in the instance. This is to simulate a connection refused error.
-
$ curl
again:$ curl -vvv "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" * Trying XXX:8000... * Connected to gdk.test (XXX) port 8000 (#0) * Server auth using Basic with user 'root' > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1 > Host: gdk.test:8000 > Authorization: Basic XXX > User-Agent: curl/8.1.2 > Accept: */* > < HTTP/1.1 502 Bad Gateway < Cache-Control: no-cache < Content-Length: 0 < Content-Security-Policy: default-src 'none' < Content-Type: text/plain; charset=utf-8 < Vary: Origin < X-Accel-Buffering: no < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-Gitlab-Meta: {"correlation_id":"01HFF33NX8WP5EM5BNP0J035MT","version":"1"} < X-Request-Id: 01HFF33NX8WP5EM5BNP0J035MT < X-Runtime: 0.045048 < Date: Fri, 17 Nov 2023 16:51:30 GMT < * Connection #0 to host gdk.test left intact
We can see that:
- The response is almost instant. This is expected, we didn't hit the timeout limit. Instead, we hit a network issue in the
send_url
function. - The response status code is
503 Service Unavailable
, which is the custom response status code.
- Remove the
TCP 8081
rule from theNetworking
tab. This is to simulate a connection that will never reply, thus this will hit the timeout. -
$ curl
again:curl -vvv "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt" * Trying XXX:8000... * Connected to gdk.test (XXX) port 8000 (#0) * Server auth using Basic with user 'root' > GET /api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test.txt HTTP/1.1 > Host: gdk.test:8000 > Authorization: Basic XXX > User-Agent: curl/8.1.2 > Accept: */* > < HTTP/1.1 504 Gateway Timeout < Cache-Control: no-cache < Content-Length: 0 < Content-Security-Policy: default-src 'none' < Content-Type: text/plain; charset=utf-8 < Vary: Origin < X-Accel-Buffering: no < X-Content-Type-Options: nosniff < X-Frame-Options: SAMEORIGIN < X-Gitlab-Meta: {"correlation_id":"01HFF31G5W528CPVHFBK4DZERM","version":"1"} < X-Request-Id: 01HFF31G5W528CPVHFBK4DZERM < X-Runtime: 0.189770 < Date: Fri, 17 Nov 2023 16:50:29 GMT < * Connection #0 to host gdk.test left intact
We can see that:
- The response comes back after ~10secs, which is the custom timeout.
- The response status code is
504 Gateway Timeout
, which is the custom response status code.
3️⃣ Pulling a large file
From !136172 (comment 1642879177). Basically, downloading a large file (which will take time) should not hit any timeout.
Downloading a large file is not something that is rare when dealing with packages. As such, this scenario should work as usual and not be broken.
Create a large file on the cloud server:
cd srv/com/my/company/1.2.3
dd if=/dev/zero of=test1G.txt bs=1 count=0 seek=1G
cd ../../../..
ruby -run -e httpd . -p 8081
Pull it through the dependency proxy:
$ curl -o large_file.txt "http://<maintainer_username>:<maintainer_pat>@gdk.test:8000/api/v4/projects/<project_id>/dependency_proxy/packages/maven/com/my/company/1.2.3/test1G.txt"
- You will see that after
10secs
, the download is not canceled.✅
🔮 Conclusions
We have a coherent and identical behavior in case of network errors when using send_url
or send_dependency
: we return a 503 Service Unavailable
.
🛴 MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.