Auto DevOps : Helm cannot upgrade over initial failed release
Steps to reproduce
- Have a failed Helm release (e.g. forgot to set base domain)
- Fix issue from 1, run Auto DevOps pipeline again
- Failure below occurs
$ deploy
secret "review-secrets-ayz3io-secret" deleted
secret "review-secrets-ayz3io-secret" replaced
Initializing...
Error: UPGRADE FAILED: "review-secrets-ayz3io" has no deployed releases
ERROR: Job failed: exit code 1
Notes
- This seems to only occur if ALL releases for that chart have only ever failed.
- If there is a successful release in the past, then Helm can upgrade over a previously failed release
Workaround
As shown by @tnir in https://gitlab.com/gitlab-org/gitlab-ce/issues/54760#note_146627376
- Run the following script locally. For the
production
job, the<chart-name>
isproduction
. For other jobs, runhelm ls
which should show you which chart to delete.
$ tiller -version
v2.13.0
$ export KUBE_NAMESPACE=<CI_PROJECT_NAME>-<CI_PROJECT_ID>-<CI_PROJECT_ENVIRONMENT>
$ export TILLER_NAMESPACE=$KUBE_NAMESPACE
$ tiller -listen localhost:44134 &
[1] 28659
$ export HELM_HOST="localhost:44134"
$ helm init --client-only
$ helm delete <release-name> --purge --tiller-namespace $KUBE_NAMESPACE
release "<release-name>" deleted
- Retry the deploy job
Links
See also discussion in https://github.com/helm/helm/issues/3208 and https://github.com/helm/helm/issues/3353
Proposed fix
Invoke --atomic
by default when installing releases using Auto DevOps (this can be added in https://gitlab.com/gitlab-org/cluster-integration/auto-deploy-image)
Add support to disable calling --atomic
by checking AUTO_DEVOPS_ATOMIC_RELEASE
env variable (Why? In case where users may attach a PVC that is deleted on reclaim, we want to give users the option to opt out on this)
Edited by Thong Kuah