Geo: Find the most common sync failures
Release notes
Geo: Find the most common sync failure messages with the Rake task gitlab-rake geo:top_sync_failures.
Problem to solve
If you visit Admin Area > Geo > Sites, and there are sync or verification failures. Then the next step is to examine the failure messages. This issue proposes to display failure messages.
But if there 10000 failure messages, you want to identify most common problems, so that you can resolve them first.
Proposal - Rake task - Weight 2
MAX=10 DATA_TYPE=Upload gitlab-rake geo:top_sync_failures
This class contains an example of the Rails ActiveRecord query needed to achieve this.
Proposal - UI - Admin Area - Weight 7
We could add a view that groups sync failures by most common to least common. It would have a table like:
| Number of failures | Sync failure message |
|---|---|
| 542 | Cannot connect to primary container registry |
| 19 | The file is missing on the Geo primary site |
| 1 | Error while syncing object f3a093tru0923jtr0a293jta0923j9ja0293j5329j5r02935kr0293kj5r0923ar09a |
| 1 | Error while syncing object a2039502395j0a293j5092a3j509a23kjr09o32k50923u509j3a20rt9jk320r9kja2 |
| 1 | Error while syncing object ija3rt98aw9o3r8uwa39oi9rulaw3i85rikzwq3y5ikyaw345iay3i5ryzal3iwq85ul |
This class contains an example of the Rails ActiveRecord query needed to achieve this.
- Add capability to GraphQL API. Probably need a new Type.
- Add ability to display this in the frontend.
Intended users
- Sidney (Systems Administrator)
- Support Engineer
- Sasha (Software Developer)
Feature Usage Metrics
Number of views of Admin Area > Geo > Replication Details?
Does this feature require an audit event?
No.