For this issue, we are interesting in tracking the following metrics over time (at least one month)
node count per cluster
node resource utilization, CPU and memory - we use these metrics to autoscale
Since both OpenShift and Kubernetes CI clusters run in Google Cloud, a low-effort solution here may be to use Google Cloud Monitoring, formerly Stackdriver.
One question to consider - are these GCP charts embeddable, or do we have to go to GCP console to see them? Thinking ahead, it may be nice to have them on our monitoring website.
I don't have much context on the issue. But in order to monitor we can always set up Prometheus to monitor the utilization. For the node count, we can have microservices in Go that consumes the client-go library to keep a track of it.
Thanks for commenting @kitarp29. Prometheus should be a better fit for our use case because we run CI clusters in GCP and AWS and want to be able to view the scaling and performance metrics for all clusters in one place instead of using cloud provider-specific tools like stackdriver.
Our CI clusters are launched via pipelines. Scripts and configuration for our GKE, EKS, and OpenShift clusters are stored at https://gitlab.com/gitlab-org/distribution/infrastructure. Most likely this is where we want to set up autoscaling monitoring.