Network timeouts between the Rails backend and the Container Registry

Details

  • Point of contact for this request: @10io
  • If a call is needed, what is the proposed date and time of the call: not needed
  • Additional call details (format, type of call): n / a

SRE Support Needed

🎋 Context

For some features, the Rails backend needs to contact the Container Registry.

Looking at this Sentry, it seems that sometimes we encounter network timeouts where the connection can't be open to the Container Registry.

In gitlab-org/gitlab!50750 (merged), we improved the observability of the Container Registry ruby client used by the rails backend.

This led to this dashboard where we can clearly see the network errors and which url in the Container Registry was contacted.

The Container Registry ruby client uses 3 different timeouts for its network operations:

  1. Open timeout (10s)
  2. Read timeout (20s)
  3. Write timeout (30s)

From the Kibana dashboard above, the majority of the errors is (1.) the open timeout. As an example, see this error in Sentry. It's happening when net/http.rb initializes the connection and opens it.

Note that because those errors are in majority when the connection is established, there are no traces of such errors in the Container Registry logs.

💥 Users impact

The Container Registry ruby client is used to power these features:

These impacts are low by definition and don't break any major feature.

Having said that, the cleanup policies workers deal with a non trivial amount of daily work. Lowering the number of "slots" by one will reduce their efficiency.

🚒 SRE Support Request

Given the above, we want to make sure that everything is working fine between the Rails backend and the Container Registry.

To my limited knowledge, some components (such as proxies) can be present between them.

  • Can these intermediary components provide any logs?
  • Is there any errors that could explain these "random" timeouts
  • Is there any kind of rate limiting applied here that would make the connection to be dropped and the Container Registry ruby client can't establish one.