Skip to content

Better support for connection errors with the Container registry

David Fernandez requested to merge 227466-handle-container-registry-errors into master

🏟 Context

GitLab can be used to host container images through the Container registry. Users can push tags for a given container image to it.

Obviously, container images and tags are hosted on the Container registry. On the rails side, only the container images are present (or better said mirrored) in the database.

The rails backend provides many ways to display/interact with container images/tags:

  1. UI
    • We present the list of container images associated with a project.
    • Clicking on a container image, will present the tags.
  2. rest API
  3. GraphQL API

Depending on the request, the rails backend will pull data from:

  • the database
  • the container registry API

With #227466 (closed), it has been noticed that when the Container registry is down, the different interfaces ((1.), (2.) and (3.)) don't react properly.

This MR aims to better support connection errors with the Container registry.

Design choices

UI (frontend / backend)

On the UI side, we have "only" two pages: the index of all container images and the details of a container image with the tags. These two pages use data from both sources.

In addition to this, those pages are "doubled" because we can access them at the project level or the group level.

Because we have 2 times the same set of 2 pages, we use a single Vue component that takes care of the index and the details page.

For the UI, we are going to ping the container registry to know if it's available and display an error message if it isn't. We do this because we use data from both sources and so, we don't want to display partial or incomplete pages. That's why we're going to display the error message even though the data from the database is available.

The idea here is to catch those connection errors and pass the proper variables to the Vue component to tell it: hey, display the error message.

One thing to get here is that rails will read the data from the database and pass it to the Vue component. That component will display that data and have spinners for data coming from the Container registry. Meanwhile, that data is displayed, the component will use GraphQL to get the data from the Container registry.

To display the error message, we need to pass the proper flags to the Vue component. For that to happen, the rails controller needs to check if the Container registry is alive. We use this endpoint for that.

Rest API

On the Rest API, we are taking the opposite approach of the UI. Depending on which data is accessed, the error message is displayed or not. For example, if we ask for the container image attributes only, this will likely to always work because it only uses the data from the database. If we ask for the container image attribtes and the tags count, this could fail as the tags count need to use the Container registry API.

Here, the idea is as simple as it gets: catch the connection errors and display a service unavailable response with the proper error message.

GraphQL API

Similar thing than the Rest API here: depending on the data accessed, the request can fail because the data from the Container registry is requested.

🤔 What does this MR do and why?

  1. UI
    • Catch Container registry errors and set instance variables (boolean flags)
    • Pass those flags to the Vue component
    • Update the component so that the error message is displayed whenever any of the error flags is true
    • Update the related specs
  2. Rest API
    • Catch Container registry errors and return the service unavailable response
    • Update the related specs
  3. GraphQL API
    • Catch Container registry errors and raise the unavailable resource error class from the GraphQL side
    • Update the related specs

📷 Screenshots or screen recordings

All the screenshots below have been taken when the Container registry is shut down.

UI

scenario master this MR
Project level. Container images index page (<base_url>/<project_path>/container_registry) Screenshot_2021-09-27_at_09.47.43 Screenshot_2021-09-23_at_16.05.38
Project level. Container images details page (<base_url>/<project_path>/container_registry/<container_repository_id>) Screenshot_2021-09-27_at_09.49.06 Screenshot_2021-09-23_at_16.15.57
Group level. Container images index page (<base_url>/groups/<group_path>/-/container_registries) Screenshot_2021-09-27_at_09.45.09 Screenshot_2021-09-23_at_16.10.17
Group level. Container images details page (<base_url>/groups/<group_path>/-/container_registries/<container_repository_id>) Screenshot_2021-09-27_at_09.46.37 Screenshot_2021-09-23_at_16.12.43

Notes:

  • We don't allow for partial information. Meaning that we either have all the data from both sources or we display a connection error message.
  • † : To handle those cases, we need more changes in the frontend side. I'm proposing here to handle these in a follow up. Here is the issue: #341725.

Rest API

scenario master this MR
Project level. get container images (<base_url>/api/v4/projects/<project_id>/registry/repositories) Screenshot_2021-09-27_at_09.44.07 Screenshot_2021-09-23_at_16.30.05
Project level. get container images with tags (<base_url>/api/v4/projects/<project_id>/registry/repositories?tags=true) Screenshot_2021-09-27_at_09.43.25 Screenshot_2021-09-23_at_17.08.49
Project level. get container images with tags_count (<base_url>/api/v4/projects/<project_id>/registry/repositories?tags_count=true) Screenshot_2021-09-27_at_09.42.47 Screenshot_2021-09-23_at_17.09.57
Project level. get container images tags (<base_url>/api/v4/projects/<project_id>/registry/repositories/<repository_id>/tags) Screenshot_2021-09-27_at_09.42.16 Screenshot_2021-09-23_at_17.14.02
Group level. get container images (<base_url>/api/v4/groups/<group_id>/registry/repositories) Screenshot_2021-09-27_at_09.41.31 Screenshot_2021-09-23_at_17.19.37
Group level. get container images with tags (<base_url>/api/v4/groups/<group_id>/registry/repositories?tags=true) Screenshot_2021-09-27_at_09.40.47 Screenshot_2021-09-23_at_17.16.54
Group level. get container images with tags_count (<base_url>/api/v4/groups/<group_id>/registry/repositories?tags_count=true) Screenshot_2021-09-27_at_09.39.52 Screenshot_2021-09-23_at_17.17.22
Get container image details (<base_url>/api/v4/registry/repositories/<container_repository_id>) Screenshot_2021-09-27_at_09.38.54 Screenshot_2021-09-23_at_17.20.27
Get container image details with tags (<base_url>/api/v4/registry/repositories/<container_repository_id>?tags=true) Screenshot_2021-09-27_at_09.38.23 Screenshot_2021-09-23_at_17.21.27
Get container image details with tags_count (<base_url>/api/v4/registry/repositories/<container_repository_id>?tags_count=true) Screenshot_2021-09-27_at_09.37.18 Screenshot_2021-09-23_at_17.23.37

GraphQL API

scenario master this MR
Project level. get container images Screenshot_2021-09-23_at_17.50.14 Screenshot_2021-09-23_at_17.24.58
Project level. get container images with tagsCount Screenshot_2021-09-23_at_17.50.44 Screenshot_2021-09-23_at_17.25.33
Group level. get container images Screenshot_2021-09-23_at_17.51.10 Screenshot_2021-09-23_at_17.26.46
Group level. get container images with tagsCount Screenshot_2021-09-23_at_17.51.42 Screenshot_2021-09-23_at_17.26.23
Get container image details Screenshot_2021-09-23_at_17.46.21 Screenshot_2021-09-23_at_17.27.30
Get container image details with tags Screenshot_2021-09-23_at_17.49.41 Screenshot_2021-09-23_at_17.28.05
Get container image details with tagsCount Screenshot_2021-09-23_at_17.47.27 Screenshot_2021-09-23_at_17.28.51

Notes:

  • Notice that with this MR, we have a much more precise message that targets the field that raised the error.

How to set up and validate locally

  1. Setup the container registry
  2. Create a group and a project inside that group
  3. On the project, upload a few container images and tags with https://gitlab.com/nmezzopera/container-factory
  4. Try the scenarios above

🛃 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports