Improve the reliability and observability of the container registry client

🍋 Context

Container Registry objects are not all present in the rails backend. For objects that are not in the rails database (such as image tags), the backend has to query the container registry api.

For those interactions with the container registry api, a single class is used: https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/container_registry/client.rb. This class is essentially a DSL to access the container registry. It's using a Faraday client with different options to access the container registry API.

The problem we have with the current state is:

See #280545 (closed) for more details.

This MR tries to tackle two aspects of this client:

  • Reliability
    • Be more resilient against network timeout errors.
  • Observability
    • Properly log the network errors and the associated context (eg. which url in the container registry api was accessed)
    • This will help us to provide some traces to the infrastructure team so that they can investigate those errors.

🔭 What does this MR do?

🖥 Screenshots (strongly suggested)

Example of an timeout error logged

{"severity":"ERROR","time":"2020-12-31T10:54:22.465Z","correlation_id":"01ETW6XQ7WHYB9P7GZE0KZSZWG","tags.correlation_id":"01ETW6XQ7WHYB9P7GZE0KZSZWG","tags.locale":"en","user.id":1,"user.email":"admin@example.com","user.username":"root","extra.class":"ContainerRegistry::Client","extra.url":"http://gdk.test:5050/v2/gitlab.org/ci-cd/package-stage/cicd_102/story-of-beans/tags/list","exception.class":"Faraday::TimeoutError","exception.message":"Net::ReadTimeout with #\u003cTCPSocket:(closed)\u003e","exception.backtrace":["lib/gitlab/faraday/error_callback.rb:12:in `call'","lib/container_registry/client.rb:72:in `repository_tags'","app/models/container_repository.rb:68:in `manifest'","lib/gitlab/metrics/instrumentation.rb:160:in `block in manifest'","lib/gitlab/metrics/method_call.rb:27:in `measure'","lib/gitlab/metrics/instrumentation.rb:160:in `manifest'","app/models/container_repository.rb:82:in `tags_count'","lib/gitlab/metrics/instrumentation.rb:160:in `block in tags_count'","lib/gitlab/metrics/method_call.rb:27:in `measure'","lib/gitlab/metrics/instrumentation.rb:160:in `tags_count'","app/graphql/types/base_field.rb:46:in `block (2 levels) in resolve_field'","app/graphql/types/base_field.rb:35:in `block in resolve_field'","app/graphql/types/base_field.rb:31:in `resolve_field'","lib/gitlab/graphql/calls_gitaly/instrumentation.rb:18:in `block in instrument'","lib/gitlab/graphql/generic_tracing.rb:40:in `with_labkit_tracing'","lib/gitlab/graphql/generic_tracing.rb:30:in `platform_trace'","lib/gitlab/graphql/generic_tracing.rb:40:in `with_labkit_tracing'","lib/gitlab/graphql/generic_tracing.rb:30:in `platform_trace'","app/graphql/gitlab_schema.rb:43:in `multiplex'","app/controllers/graphql_controller.rb:68:in `execute_multiplex'","app/controllers/graphql_controller.rb:32:in `execute'","ee/lib/gitlab/ip_address_state.rb:10:in `with'","ee/app/controllers/ee/application_controller.rb:44:in `set_current_ip_address'","app/controllers/application_controller.rb:495:in `set_current_admin'","lib/gitlab/session.rb:11:in `with_session'","app/controllers/application_controller.rb:486:in `set_session_storage'","lib/gitlab/i18n.rb:73:in `with_locale'","lib/gitlab/i18n.rb:79:in `with_user_locale'","app/controllers/application_controller.rb:480:in `set_locale'","lib/gitlab/error_tracking.rb:52:in `with_context'","app/controllers/application_controller.rb:545:in `sentry_context'","app/controllers/application_controller.rb:473:in `block in set_current_context'","lib/gitlab/application_context.rb:56:in `block in use'","lib/gitlab/application_context.rb:56:in `use'","lib/gitlab/application_context.rb:22:in `with_context'","app/controllers/application_controller.rb:464:in `set_current_context'","lib/gitlab/metrics/elasticsearch_rack_middleware.rb:16:in `call'","lib/gitlab/middleware/rails_queue_duration.rb:33:in `call'","lib/gitlab/metrics/rack_middleware.rb:16:in `block in call'","lib/gitlab/metrics/transaction.rb:56:in `run'","lib/gitlab/metrics/rack_middleware.rb:16:in `call'","lib/gitlab/request_profiler/middleware.rb:17:in `call'","lib/gitlab/query_limiting/middleware.rb:17:in `block in call'","lib/gitlab/query_limiting/transaction.rb:39:in `run'","lib/gitlab/query_limiting/middleware.rb:16:in `call'","lib/gitlab/jira/middleware.rb:19:in `call'","lib/gitlab/middleware/go.rb:20:in `call'","lib/gitlab/etag_caching/middleware.rb:21:in `call'","lib/gitlab/middleware/multipart.rb:234:in `call'","lib/gitlab/middleware/read_only/controller.rb:50:in `call'","lib/gitlab/middleware/read_only.rb:18:in `call'","lib/gitlab/middleware/same_site_cookies.rb:27:in `call'","lib/gitlab/middleware/handle_malformed_strings.rb:21:in `call'","lib/gitlab/middleware/basic_health_check.rb:25:in `call'","lib/gitlab/middleware/handle_ip_spoof_attack_error.rb:25:in `call'","lib/gitlab/middleware/request_context.rb:23:in `call'","config/initializers/fix_local_cache_middleware.rb:9:in `call'","lib/gitlab/middleware/static.rb:11:in `call'","lib/gitlab/webpack/dev_server_middleware.rb:34:in `perform_request'","lib/gitlab/metrics/requests_rack_middleware.rb:76:in `call'","lib/gitlab/middleware/release_env.rb:12:in `call'"]}

Notice the presence of exception.class, extra.class and extra.url with the proper values.

🛃 Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team
Edited by David Fernandez

Merge request reports

Loading