Monitoring Kubernetes Clusters
Soon we are going to have Kubernetes clusters running. We need to keep them monitored. Since we have existing infrastructure where we are keeping metrics, what work needs to be done, when we start to introduce kubernetes into various environments, for us to ensure that we are monitoring and alerting on the health of these kubernetes clusters?
Our current infrastructure relies on chef searches to determine which nodes we need prometheus to scrape metrics from. With kubernetes clusters, we'll need a way to bolster those searches to include endpoints of running prometheus inside of clusters. We need to consider the security impact that has in order to avoid exposing metrics to the outside world. We also need to consider potential situations where clusters are created in a wide array of GCP projects. If we choose to keep our monitoring in one location, we need to consider the limitations of network peering and a network addressing schema for inter connectivity between our various networks.
/cc @gitlab-org/delivery /cc @gitlab-com/gl-infra /cc @bjk-gitlab