AutoDevOps ignores `POSTGRES_ENABLED=false` when using "manual deployment to production"
situation
- I have a project where connected a kubernetes cluster and auto devops enabled.
- I have set
POSTGRES_ENABLED=false
because I don't need a postgresql database. - everything works so far (review apps, deploy to production)
problem
If I got to Settings -> CI / CD Settings
and change Deployment strategy
from Continuous deployment to production
to Automatic deployment to staging, manual deployment to production
everything works up to the staging
step.
The rollout 10%
will fail with
$ scale stable $((100-ROLLOUT_PERCENTAGE))
Error: UPGRADE FAILED: timed out waiting for the condition
UPGRADE FAILED
Error: timed out waiting for the condition
ERROR: Job failed: command terminated with exit code 1
The reason for that timeout is the pending postgres instance
root@master-01:~# kubectl -n my-project get pod,deployment,pvc
NAME READY STATUS RESTARTS AGE
pod/production-77c5c975f6-6wwt9 1/1 Running 0 90m
pod/production-77c5c975f6-nlmns 1/1 Running 0 90m
pod/production-77c5c975f6-z6rvw 1/1 Running 0 90m
pod/production-postgres-5db86568d7-qfqzn 0/1 Pending 0 4m29s
pod/production-rollout-7499ffd96c-mh9kd 1/1 Running 0 15m
pod/staging-7df9998c59-jmtrm 1/1 Running 0 19m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.extensions/production 3/3 3 3 90m
deployment.extensions/production-postgres 0/1 1 0 4m29s
deployment.extensions/production-rollout 1/1 1 1 3h
deployment.extensions/staging 1/1 1 1 10h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/production-postgres Pending 4m30s
I don't have configured and don't need a persistent volume storage and the postgres deployment prevents me from running the app in production.
Changing the Deploy.gitlab-ci.yml
templates scale function by adding
service_enabled="true"
postgres_enabled="$POSTGRES_ENABLED"
# if track is different than stable,
# re-use all attached resources
if [[ "$track" != "stable" ]]; then
service_enabled="false"
postgres_enabled="false"
fi
and
helm upgrade --reuse-values \
...
--set postgresql.enabled="$postgres_enabled" \
fixed the issue. full code:
function scale() {
track="${1-stable}"
percentage="${2-100}"
name=$(deploy_name "$track")
replicas=$(get_replicas "$track" "$percentage")
service_enabled="true"
postgres_enabled="$POSTGRES_ENABLED"
# if track is different than stable,
# re-use all attached resources
if [[ "$track" != "stable" ]]; then
service_enabled="false"
postgres_enabled="false"
fi
if [[ -n "$(helm ls -q "^$name$")" ]]; then
helm upgrade --reuse-values \
--wait \
--set replicaCount="$replicas" \
--set postgresql.enabled="$postgres_enabled" \
--namespace="$KUBE_NAMESPACE" \
"$name" \
chart/
fi
}
I suppose the scale function will always unnecessarily create the production-postgres
deployment for other users with POSTGRES_ENABLED=false
as well but, other than consuming resources, it does not do any harm and other users may not have noticed this before.
on a sidenote:
After the initial UPGRADE FAILED Error: timed out waiting for the condition
the next run will be another error.
UPGRADE FAILED
Error: kind Secret with the name "production-postgres" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
Error: UPGRADE FAILED: kind Secret with the name "production-postgres" already exists in the cluster and wasn't defined in the previous release. Before upgrading, please either delete the resource from the cluster or remove it from the chart
And I need to do a manual cleanup of the resources used
kubectl -n my-project delete deployment production-postgres
kubectl -n my-project delete service production-postgres
kubectl -n my-project delete pvc production-postgres
kubectl -n my-project delete secret production-postgres