Kubernetes Cluster Monitoring (Ops Dashboard)

With the existing Prometheus/Kubernetes integration, we are able to monitor the CPU and Memory consumption of each container, and the overall consumption of resources on individual Nodes. We can optionally augment this with a Node Exporter DaemonSet as well.

However considering our existing functionality, which exposes this information at a Project/Application level, it doesn't really make sense to expose information on the underlying hardware. The reason for this is, is that part of the value of Kubernetes is to intelligently bin pack your applications, based on their usage and requirements, in the most efficient way possible.

This means that you application shouldn't really care about the underlying hardware usage, as other applications could be generating that load, not necessarily your own.

However, that doesn't mean this information is valuable, it is. It is just valuable for a different use case.

Ops Dashboard of K8s Cluster

What this information is ideal for, is for the administrator of the cluster to get a birds eye view of its health and performance. Questions like:

What is the average efficiency we are seeing across our Nodes?
How much headroom do we have before we run out of resources, and need to add a new Node?
Are we buying the right mix of CPU/Memory based on our mix of apps?
Are applications requiring a significant chunk of resources, and then under-utilizing it?
What applications are consuming the majority of our resources?
- CPU, Memory, Network, etc.

The Kubernetes dashboard only provides a bare minimum of essentials today, effectively just:

Average CPU/Memory usage across all Nodes
A view of CPU/Memory usage of a specific Node

With Prometheus and our Kubernetes Cluster integration (https://gitlab.com/gitlab-org/gitlab-ce/issues/35956), we have an opportunity to provide a much deeper level of insight.

Edited Sep 26, 2017 by silv