Skip to content

Allow configuration of hpa stabilization window

Henri Philipps requested to merge hp-hpa_stabilization_window into master

What does this MR do?

The MR is allowing us to configure the HPA autoscaling behavior, specifically scale down stabilizationWindowSeconds. This is needed as we see high scaling frequency on GitLab.com, leading to many pods being created and terminated all the time. This is a costly operation, as our webservice or sidekiq pods typically take a minute to become ready while consuming a lot of resources.

When considering to scale down, the HPA is looking at the maximum metrics value within the last stabilizationWindowSeconds to decide if pods can be scaled down. By increasing this window, we hope to see less flappy behavior of scaling.

This is desired for gitlab-com/gl-infra/delivery#1510 (closed).

Configuring autoscaling behavior is supported since autoscaling/v2beta2 API only, thus we are consistently moving from autoscaling/v2beta1 to autoscaling/v2beta2 in all places. For this I adapted the HPA metrics configuration to the new format everywhere, e.g. like

-      targetAverageValue: {{ .hpa.targetAverageValue }}
+      target:
+        type: AverageValue
+        averageValue: {{ .hpa.targetAverageValue }}

Compatibility

This change is leaving scale down stabilizationWindowSeconds at the default of 300 seconds.

But to be able to use this feature, we need to switch from K8s autoscaling/v2beta1 to autoscaling/v2beta2. The v2beta2 API is available since K8s v1.12 and spec.behavior was added in K8s v1.18 (we are running v1.21 in GKE for GitLab.com - latest version is v1.24). As autoscaling/v2beta1 will be removed in K8s v1.25, and stable autoscaling/v2 is available since K8s v1.23, moving away from autoscaling/v2beta1 is a good idea anyway.

The only incompatibility issue I see that users might face is - if they configured .hpa.customMetrics values - that the format of that object slightly changed between autoscaling/v2beta1 and autoscaling/v2beta2 and thus the customMetrics object would need to be adapted by them accordingly.

Related issues

Related to gitlab-com/gl-infra/delivery#1510 (closed)

Related to #2625 (closed)

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion.

Required

  • Merge Request Title and Description are up to date, accurate, and descriptive
  • MR targeting the appropriate branch
  • MR has a green pipeline on GitLab.com

Expected (please provide an explanation if not completing)

  • Test plan indicating conditions for success has been posted and passes
  • Documentation created/updated
  • Tests added
  • Integration tests added to GitLab QA
  • Equivalent MR/issue for omnibus-gitlab opened
Edited by Jason Plum

Merge request reports