Omnibus Praefect dashboard improvements from demo

From the 2020-04-14 demo, we noticed the following potential improvements:

  • The table containing the latest known elected primary should also include the instance name (e.g. IP or hostname).
  • The replication queue size could be further broken down by Praefect instance and virtual storage.
  • Need a panel for the replication delay metric (gitaly_praefect_replication_delay).
  • Virtual storage flapping should use two labels ({{instance}} {{gitaly_storage}})

From the 2020-04-17 demo, we noticed we should also include:

  • A panel for the query rate(grpc_server_handled_total{job="praefect",grpc_code!="OK"}[5m])
  • A panel for the node up time: gitaly_praefect_node_last_healthcheck_up
Edited by Paul Okstad (ex-GitLab)
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information