Make GitLab network client respect Retry-After header
What does this MR do?
Implements a mechanism to wait during the number of seconds specified in the Retry-After
header in a HTTP 429 Too Many Requests
response from the server. It logs the error and then waits for the amount seconds specified in the Retry-After
header. It can be graciously terminated by the user.
Why was this MR needed?
In some situations (namely migrations involving the ci_builds
table), it is important for the server to be able to tell the runner to back off for a bit while it is undergoing maintenance. A mechanism for that is returning an HTTP 429 Too Many Requests
response with a Retry-After
value of for example 5 minutes. Currently, our implementation of for example artifacts uploader will perform retries starting with 1 second and ending at 5 seconds:
1s
2s
4s
5s
5s
Program exited.
This is clearly not enough for a maintenance of the ci_builds
table, and has resulted in the need to revert migrations.
What's the best way to test this MR?
-
Open the GDK console with a recent GitLab repo and enable the FF:
::Feature.enable(:runner_migrations_backoff)
-
Prepare a CI job before starting the runner
-
Start the runner on the GDK console, inside a lock so that the
HTTP 429
status code is returned to the runner:Gitlab::Database::Migrations::RunnerBackoff::Communicator.new(String.class).execute_with_lock { sleep(120) }
-
Start a runner on another terminal:
gitlab-runner run --config ~/.gitlab-runner/config.gdk.toml
You should notice that the runner displays the 429 status message and waits for 60 seconds after the request.
-
Cancel the command with
Ctrl-C
, the runner should exit gracefully:
What are the relevant issue numbers?
Equivalent on the GitLab Rails side: gitlab#395787 (closed)