Monitor autoscaling for operator CI clusters

changed the description

added For Scheduling devopssystems groupdistribution group::distributionbuild labels

cc @dmakovey

changed title from Monitor autoscaling for CI clusters to Monitor autoscaling for operator CI clusters

mentioned in issue gitlab-org/distribution/team-tasks#1021 (closed)

marked this issue as related to gitlab-org/distribution/team-tasks#1047 (closed)

added sectioncore platform label

added to epic &37 (closed)

changed milestone to %15.4

added Deliverable FY23Q3 priority4 labels and removed For Scheduling label

mentioned in issue gitlab-org/quality/triage-reports#7962 (closed)

added quad-planningcomplete-no-action label

@dustinmm80 Please add a type::xyz label to this issue. Thanks! /cc @twk3

added typemaintenance label

changed milestone to %15.5

One question to consider - are these GCP charts embeddable, or do we have to go to GCP console to see them? Thinking ahead, it may be nice to have them on our monitoring website.

changed milestone to %15.6

added maintenanceworkflow label

changed milestone to %15.7

changed milestone to %15.9

changed milestone to %Next 1-3 releases

I don't have much context on the issue. But in order to monitor we can always set up Prometheus to monitor the utilization. For the node count, we can have microservices in Go that consumes the client-go library to keep a track of it.

The easier way of course is to use stackdriver. I guess this would be a good implementation to this: https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics cc: @dustinmm80

Thanks for commenting @kitarp29. Prometheus should be a better fit for our use case because we run CI clusters in GCP and AWS and want to be able to view the scaling and performance metrics for all clusters in one place instead of using cloud provider-specific tools like stackdriver.

I understand Can I contribute to this issue?

Definitely, thanks for offering to help.

Our CI clusters are launched via pipelines. Scripts and configuration for our GKE, EKS, and OpenShift clusters are stored at https://gitlab.com/gitlab-org/distribution/infrastructure. Most likely this is where we want to set up autoscaling monitoring.

mentioned in issue gitlab-org/distribution/team-tasks#1176 (closed)

added FY24Q4 label

removed FY23Q3 label

added Distribution OKRO1KR1 FY24Q1 labels and removed FY24Q4 label

Reference point for the future:

https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-autoscaler-visibility

changed milestone to %15.11

Setting health status to on track as the milestone has just begun.

Issue participants are welcome to override this by setting the health status to another value.

changed health status to on track

changed milestone to %Next 1-3 releases

changed milestone to %Next 4-6 releases

removed Distribution OKRO1KR1 label

removed FY24Q1 label

removed from epic &37 (closed)

Monitor autoscaling for operator CI clusters

Designs

Child items ...

Activity

Monitor autoscaling for operator CI clusters

Relates to

Activity