Export Prometheus metrics for all HTTP requests made by runner manager

Overview

Instrument all HTTP requests made by the runner manager.

Background

In gitlab-com/gl-infra/production#19438 (closed), we noticed Sidekiq jobs would appear to be finished in the logs, but they never actually finished because the PUT requests to update the job took a long time or never completed. On the GitLab server logs, we saw a lot of EOF failures and Rack Attack rate limiting errors, but on the runner logs it was a bit harder to correlate whether there was a problem on the runner side.

One thing we discovered was that the retry mechanism wasn't even working on the runner: #38651 (closed). That will be fixed by !5409 (merged).

However, since we run a lot of shared and private runners, we should instrument all HTTP requests made by the runner manager. For example, we would like to know:

  1. The rate of GET, PUT, PATCH, and POST requests to the GitLab API.
  2. The number of retries for each endpoint.
  3. The status codes of each request.
  4. The duration of each request.

That way we can potentially come up with an Apdex for HTTP requests on the runner side.