Provide dashboard for insight into production monitoring of applications
Problem to solve
As an operations focused individual in an organization, I don't have ready access to the production health of my group level application.
Proposal
- Group-level overview of all production apps
- Group apps by nested group if applicable
- Show high-level status of app - e.g. green/yellow/red
- Drill down into individual app to see:
- service health (SLOs such as response time and error rate)
- pod health (including system metrics like memory, CPU, IO)
Group-level overview, with sub-groups kind of like:
After clicking on the environment from the ops view, the user would see the service health graphs:
At the top of the page there is a toggle to view the pod health. You could hover over a pod to see the corresponding line on the graphs, or vice versa. I also imagine you could click on a pod in order to keep the active state - this way it would work for mobile as well.
Links / references
Edited by Kenny Johnston