Skip to content

Make GitLab network client respect Retry-After header

What does this MR do?

Implements a mechanism to wait during the number of seconds specified in the Retry-After header in a HTTP 429 Too Many Requests response from the server. It logs the error and then waits for the amount seconds specified in the Retry-After header. It can be graciously terminated by the user.

Why was this MR needed?

In some situations (namely migrations involving the ci_builds table), it is important for the server to be able to tell the runner to back off for a bit while it is undergoing maintenance. A mechanism for that is returning an HTTP 429 Too Many Requests response with a Retry-After value of for example 5 minutes. Currently, our implementation of for example artifacts uploader will perform retries starting with 1 second and ending at 5 seconds:

1s
2s
4s
5s
5s

Program exited.

This is clearly not enough for a maintenance of the ci_builds table, and has resulted in the need to revert migrations.

What's the best way to test this MR?

  1. Open the GDK console with a recent GitLab repo and enable the FF:

    ::Feature.enable(:runner_migrations_backoff)
  2. Prepare a CI job before starting the runner

  3. Start the runner on the GDK console, inside a lock so that the HTTP 429 status code is returned to the runner:

    Gitlab::Database::Migrations::RunnerBackoff::Communicator.new(String.class).execute_with_lock { sleep(120) }
  4. Start a runner on another terminal:

    gitlab-runner run --config ~/.gitlab-runner/config.gdk.toml

    You should notice that the runner displays the 429 status message and waits for 60 seconds after the request.

    image

  5. Cancel the command with Ctrl-C, the runner should exit gracefully:

    image

What are the relevant issue numbers?

#30858 (closed)

Equivalent on the GitLab Rails side: gitlab#395787 (closed)

Merge request reports