Skip to content

Provide descriptive messaging when Gitaly is unreachable

Problem to solve

If Gitaly becomes unreachable, there are various places throughout the application that will be impacted. This includes:

  • Dashboard
  • Exploring projects
  • Accessing a repository
  • File browsing

It was previously discussed that we show a warning alert and identify projects that are affected along with information regarding why we can't load all the information requested, and what can be done to resolve the issue.

The problem is that, in order to identify projects that are affected, we'd need to request the uncached status from Gitaly, which would increase the load to Gitaly at a time when it is unstable or unreachable. Additionally, these statuses would ideally need to be displayed across the application since there will be several different pages impacted. We'll need to solve communicating more broadly what the status is, versus individually on pages or projects.

Intended users

All personas would be affected by Gitaly being unreachable.

Proposal

As discussed in #205488 (comment 329787521):

Show a warning message that something went wrong (when GRPC::Unavailable and GRPC::DeadlineExceeded were raised) and not all content are displayed. Do this for every request including AJAX ones but the message should only show once and on the page where the problem was experienced.

Previous Proposal We want to communicate to users these things: * **What** the problem is and what it may cause (Gitaly is unreachable, XYZ data will be unavailable) * **How** can it be resolved

One suggestion is to display a custom 500 error page when the user tries to access a project that is unreachable, and to monitor Gitaly's server status in order to show a global warning alert.

Permissions and Security

Documentation

Availability & Testing

  • As noted, availability could be affected if monitoring Gitaly's status adds additional load when it's already unstable.
  • Unit/integration tests should provide most of the coverage, but a new E2E test should be added for extra confidence.
  • The change isn't expected to affect existing E2E tests.

What does success look like, and how can we measure that?

What is the type of buyer?

Links / references

Edited by Patrick Bajao