replicaCount defaulting to 1 is forcing HPA auto-scaled pods to Terminate on deploy
Apologies if I've actually configured this wrong (and I assume I have) but I've got an app running with HPA enabled using between 1-100 replicas. Before deployment I have 50 replicas (automatically scaled as such based on CPU) and upon deployment, it'll Terminate ALL pods to 0x before deploying the new ones.
GKE Cluster master: 1.21.5-gke.1302
- I'm deploying multiple apps in the same namespace (not sure if relevant)
- strategy is not defined (default is RollingUpdate, of which I see when describing the deployment)
- replicaCount is not defined but it looks like it's provided regardless of being defined or not
- I had originally left in replicaCount: 1 in my values.yaml but removed it (not sure if relevant)
deployment.yaml currently has (https://gitlab.com/gitlab-org/cluster-integration/auto-deploy-image/-/blob/master/assets/auto-deploy-app/templates/deployment.yaml#L20):
replicas: {{ .Values.replicaCount }}
but shouldn't it first check if it exists?
{{- if .Values.replicaCount }}
replicas: {{ .Values.replicaCount }}
{{- end }}
This comment here in the kubernetes repository seems to suggest that setting replicas to 1 will do what I'm seeing: https://github.com/kubernetes/kubernetes/issues/25238#issuecomment-217928835
My current configuration:
variables:
AUTO_BUILD_IMAGE_VERSION: 'v1.5.0'
AUTO_DEPLOY_IMAGE_VERSION: 'v2.17.0'
production:
extends: .auto-deploy
stage: production
script:
- export DB_MIGRATE="npm run predeploy"
- auto-deploy check_kube_domain
- auto-deploy download_chart
- auto-deploy ensure_namespace
- auto-deploy initialize_tiller
- auto-deploy create_secret
- auto-deploy deploy
- auto-deploy delete canary
- auto-deploy persist_environment_url
environment:
name: $PROJECT_NAME/production
url: https://$PROJECT_NAME.$KUBE_INGRESS_BASE_DOMAIN
allow_failure: false
rules:
- if: '$CI_KUBERNETES_ACTIVE == null || $CI_KUBERNETES_ACTIVE == ""'
when: never
- if: '$CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH'
when: manual
.auto-deploy:
image: 'registry.gitlab.com/gitlab-org/cluster-integration/auto-deploy-image:${AUTO_DEPLOY_IMAGE_VERSION}'
dependencies: []
artifacts:
paths: [environment_url.txt, tiller.log]
when: always
My simplified auto-deploy-values.yaml is basically:
application:
track: stable
tier: web
hpa:
enabled: true
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
service:
enabled: true
annotations: {}
name: web
type: ClusterIP
externalPort: 5000
internalPort: 5000
resources:
requests:
cpu: 400m
memory: 512Mi
podDisruptionBudget:
enabled: false
maxUnavailable: 1
My other app based off of registry.gitlab.com/gitlab-org/cluster-integration/auto-deploy-image:v1.0.7
with the same values seems not to be suffering from this.
In fact, if I redeploy the same artifact via v1.0.7, it doesn't touch my deployment at all and no pods are removed (I guess because it's already up-to-date) where-as with v2.0.7 it'll terminate all pods despite nothing changing.
Here's 2x screenshots showing the before/after deploy: