Make GitLab network client respect Retry-After header (!4102) · Merge requests · GitLab.org / gitlab-runner

Pedro Pombeiro (OOO from July 16th till Aug 7th) requested to merge pedropombeiro/30858/respect-Retry-After into main May 19, 2023

What does this MR do?

Implements a mechanism to wait during the number of seconds specified in the Retry-After header in a HTTP 429 Too Many Requests response from the server. It logs the error and then waits for the amount seconds specified in the Retry-After header. It can be graciously terminated by the user.

Why was this MR needed?

In some situations (namely migrations involving the ci_builds table), it is important for the server to be able to tell the runner to back off for a bit while it is undergoing maintenance. A mechanism for that is returning an HTTP 429 Too Many Requests response with a Retry-After value of for example 5 minutes. Currently, our implementation of for example artifacts uploader will perform retries starting with 1 second and ending at 5 seconds:

1s
2s
4s
5s
5s

Program exited.

This is clearly not enough for a maintenance of the ci_builds table, and has resulted in the need to revert migrations.

What's the best way to test this MR?

Open the GDK console with a recent GitLab repo and enable the FF:
```
::Feature.enable(:runner_migrations_backoff)
```
Prepare a CI job before starting the runner

Start the runner on the GDK console, inside a lock so that the HTTP 429 status code is returned to the runner:

Gitlab::Database::Migrations::RunnerBackoff::Communicator.new(String.class).execute_with_lock { sleep(120) }

Start a runner on another terminal:
```
gitlab-runner run --config ~/.gitlab-runner/config.gdk.toml
```
You should notice that the runner displays the 429 status message and waits for 60 seconds after the request.
Cancel the command with Ctrl-C, the runner should exit gracefully:

What are the relevant issue numbers?

#30858 (closed)

Equivalent on the GitLab Rails side: gitlab#395787 (closed)

Edited May 20, 2023 by Pedro Pombeiro (OOO from July 16th till Aug 7th)

Make GitLab network client respect Retry-After header

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports