Skip to content

Provide dashboard for insight into production monitoring of applications

Problem to solve

As an operations focused individual in an organization, I don't have ready access to the production health of my group level application.

Proposal

  • Group-level overview of all production apps
  • Group apps by nested group if applicable
  • Show high-level status of app - e.g. green/yellow/red
  • Drill down into individual app to see:
    • service health (SLOs such as response time and error rate)
    • pod health (including system metrics like memory, CPU, IO)

Group-level overview, with sub-groups kind of like:

ops-dashboard-03

After clicking on the environment from the ops view, the user would see the service health graphs:

ops-dashboard__drill-down--service-health

At the top of the page there is a toggle to view the pod health. You could hover over a pod to see the corresponding line on the graphs, or vice versa. I also imagine you could click on a pod in order to keep the active state - this way it would work for mobile as well.

ops-dashboard__drill-down--pod-health

Links / references

Edited by Kenny Johnston