Skip to content

Implement instrumenters for external HTTP requests

Related issue

A part of gitlab-com/gl-infra/scalability#302 (closed)

Problems

In the Gitlab rails application, a small amount of external HTTP requests are monitored in higher application layer, such as Elastic Search. All other requests made via Gitlab::HTTP by Gitlab application, or by other 3rd-party gems are not measured. We need to find a way to improve visibility of time spent with external HTTP requests.

The main problem of this task is that the requests to 3rd party services are handled by different HTTP libraries. They are a big mess.

  • A big portion of 3rd requests made by Gitlab are currently proxied by Gitlab::HTTP. Underlying, Gitlab::HTTP is just a wrapper for HTTParty.
  • HTTParty is used directly by hipchat and flowdoc. HTTParty is a wrapper around Net::HTTP
  • Faraday is used in ms_rest, ms_rest_azure, oauth2, octokit, sawyer, sentry, signet, acme-client, asana, azure, googleauth, graphclient.
  • Faraday itself can adopt different HTTP transport backends (Net::HTTP, Excon, etc.). By default, Farady uses Net::HTTP backend.
  • httpclient is used by google-api-client, rack-oauth2, elasticsearch-* (via dependency detection). This gem implements a custom HTTP protocol layer.
  • rest-client is used by discordrb, kubeclient. This gem uses Net::HTTP
  • exon: used by fog-*. This gem implements its own HTTP protocol layer.
  • And an unknown numbers of gems and lines of code are using Net::HTTP directly.

The observable information extracted from external HTTP requests will be used in:

Solution

There are three HTTP clients to care about: Net::HTTP, Excon, and HTTPClient. All other HTTP clients depend on those clients for the HTTP protocol implementations. For each client, I implement a publisher so that an ActiveRecord::Notification event is broadcasted whenever it makes a HTTP request. All clients (even ones in future) broadcast the same event into the same topic so that it is trivial to support a new HTTP client (I hope we don't have to 😢).

  • Net::HTTP is bundled with Ruby. It doesn't support middleware. So, prepend this class is the only solution.
  • Excon supports middleware mechanism. It also supports setting default middleware.
  • httpclient is similar to Net::HTTP although the code is messy and old.

Broadcasted notification's payload in topic "request.external_http" includes:

  • method (String): "GET"
  • code (String): "200". This is the status code read directly from HTTP response. Ruby uses a String.
  • duration (Float - seconds): 0.234
  • host (String): "gitlab.com"
  • port (Integer): 80,
  • path (String): "/gitlab-org/gitlab"
  • scheme (String): "https"
  • query (String): "field_a=1&field_b=2"
  • proxy_host (String - Optional): "proxy.gitlab.com"
  • proxy_port (Integer - Optional): 80
  • exception (Array - Optional): ["Net::ReadTimeout", "Net::ReadTimeout with #<TCPSocket:(closed)>"]
  • exception_object (Error Object - Optional): #<Net::ReadTimeout: Net::ReadTimeout>

The request.external_http topic is subscribed by multiple subscriptions:

  • Labkit Tracing subcribes to the topic and creates spans accordingly. There are some reference implementation.
  • Performance Bar (Peek) subscribes to this topic. Each event is stored in a temporary in-memory storage, formatted, and returned to the Performance Bar frontend.
  • Lograge. I haven't looked into this (yet), but assumed it's feasible to integrate.

Testing strategies

As the target publishers are injected into HTTP Clients, HTTP Mocking library like Webmock doesn't work. Stubbing is not good either, as it may lead to false positives. Therefore, I decide to bring up HTTP servers (with the support from Webrick and Rack), then make real HTTP requests, and assert the captured notifications. Besides the core HTTP clients, I also add some tests for dependent clients like HTTParty and Faraday.

In next MRs

  • Not sure whether I should add more information, such as request headers, response headers, and so on.
  • Requests to Jeager endpoints are captured. It's eligible, but noisy. There should be a blocklist option for the instrumenter.
  • Integrate External HTTP notifications into Lograge
  • Integrate External HTTP notifications into Gitlab Metrics Transaction

Screenshots

Request trace includes external HTTP spans: Screenshot_from_2020-12-29_11-23-41

Failed requests are emphasized in the tracing system, even if there is an automatically retry. Screenshot_from_2020-12-29_14-19-19

Performance Bar (Not in this merge request) Screenshot_from_2020-12-29_15-32-38

External HTTP payload in Puma logs (not in this merge request)

Screenshot_from_2020-12-30_16-16-29

External HTTP payload in Sidekiq (not in this merge request)

Screenshot_from_2020-12-30_16-14-10

Some External HTTP Prometheus metrics (not in this merge request) Screenshot_from_2020-12-30_16-19-09

Edited by Quang-Minh Nguyen

Merge request reports