Review Apps deployment sometimes fail due to resources not yet deleted after a failed deployment
When deploying Review Apps, if the previous deployment failed (if [ "$CI_ENVIRONMENT_SLUG" != "production" ] && previous_deploy_failed "$CI_ENVIRONMENT_SLUG" ; then
), the release is deleted (helm delete --purge "$name"
).
The problem is that the deploy command doesn't wait for the resources to be actually deleted and we could end-up with a race condition where helm is trying to install a resource but a resource already exists (and is being deleted):
** Checking for previous deployment of review-54331-pipe-ty511y **
Previous deployment found, checking status...
Previous deployment state: FAILED
Deployment in bad state, cleaning up review-54331-pipe-ty511y
** Deleting release 'review-54331-pipe-ty511y'... **
release "review-54331-pipe-ty511y" deleted
** Creating the review-54331-pipe-ty511y-gitlab-initial-root-password secret in the review-apps-ce namespace... **
secret "review-54331-pipe-ty511y-gitlab-initial-root-password" configured
Deploying with:
helm upgrade --install --wait --timeout 900 --set releaseOverride="review-54331-pipe-ty511y" --set global.appConfig.enableUsagePing=false --set global.imagePullPolicy=Always --set global.hosts.hostSuffix="review-54331-pipe-ty511y" --set global.hosts.domain="ce.gitlab-review.app" --set global.ingress.configureCertmanager=false --set global.ingress.tls.secretName=tls-cert --set global.ingress.annotations."external-dns\.alpha\.kubernetes\.io/ttl"="10" --set certmanager.install=false --set prometheus.install=false --set nginx-ingress.controller.service.enableHttp=false --set nginx-ingress.controller.replicaCount=2 --set nginx-ingress.controller.config.ssl-ciphers="ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4" --set gitlab.migrations.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-rails-ce" --set gitlab.migrations.image.tag="54331-pipeline-graph-extend-scroll-area" --set gitlab.gitaly.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitaly" --set gitlab.gitaly.image.tag="v1.53.0" --set gitlab.gitlab-shell.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-shell" --set gitlab.gitlab-shell.image.tag="v9.3.0" --set gitlab.sidekiq.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-sidekiq-ce" --set gitlab.sidekiq.image.tag="54331-pipeline-graph-extend-scroll-area" --set gitlab.unicorn.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-unicorn-ce" --set gitlab.unicorn.image.tag="54331-pipeline-graph-extend-scroll-area" --set gitlab.unicorn.workhorse.image="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-workhorse-ce" --set gitlab.unicorn.workhorse.tag="54331-pipeline-graph-extend-scroll-area" --set gitlab.task-runner.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-task-runner-ce" --set gitlab.task-runner.image.tag="54331-pipeline-graph-extend-scroll-area" --set nginx-ingress.controller.resources.limits.cpu=200m --set nginx-ingress.controller.resources.requests.memory=210M --set nginx-ingress.controller.resources.limits.memory=420M --set nginx-ingress.defaultBackend.resources.limits.cpu=10m --set nginx-ingress.defaultBackend.resources.requests.memory=12M --set nginx-ingress.defaultBackend.resources.limits.memory=24M --set gitlab.gitaly.resources.requests.cpu=150m --set gitlab.gitaly.resources.limits.cpu=300m --set gitlab.gitaly.resources.limits.memory=420M --set gitlab.gitlab-shell.resources.requests.cpu=70m --set gitlab.gitlab-shell.resources.limits.cpu=140m --set gitlab.gitlab-shell.resources.requests.memory=20M --set gitlab.gitlab-shell.resources.limits.memory=40M --set gitlab.sidekiq.resources.requests.cpu=200m --set gitlab.sidekiq.resources.limits.cpu=300m --set gitlab.sidekiq.resources.requests.memory=800M --set gitlab.sidekiq.resources.limits.memory=1.2G --set gitlab.unicorn.resources.limits.cpu=800m --set gitlab.unicorn.resources.limits.memory=2.6G --set redis.resources.limits.cpu=200m --set redis.resources.limits.memory=130M --set minio.resources.limits.cpu=200m --set minio.resources.limits.memory=280M --set gitlab-runner.resources.requests.cpu=300m --set gitlab-runner.resources.limits.cpu=600m --set gitlab-runner.resources.requests.memory=300M --set gitlab-runner.resources.limits.memory=600M --namespace="review-apps-ce" --version="73242972-260455843" "review-54331-pipe-ty511y" .
Release "review-54331-pipe-ty511y" does not exist. Installing it now.
Error: release review-54331-pipe-ty511y failed: object is being deleted: persistentvolumeclaims "review-54331-pipe-ty511y-postgresql" already exists
Helm 2.13.0 introduced a new --atomic
flag for the install
& upgrade
command: https://github.com/helm/helm/releases/tag/v2.13.0
introduced
--atomic
tohelm install
andhelm upgrade
which restores the previous state in case of a failed install/upgrade attempt
That should remove the need for cleaning up a previous failed deployment as resources would be cleaned up if the deployment fails.