Review Apps sometimes fail to be installed
ensure_namespace
During
The connection to the server 146.148.103.140 was refused - did you specify the right host or port?
This can be seen in https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/147362252.
$ ensure_namespace
The connection to the server 146.148.103.140 was refused - did you specify the right host or port?
The connection to the server 146.148.103.140 was refused - did you specify the right host or port?
install_tiller
During
cannot connect to Tiller
This can be seen in https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/143077221:
$ install_tiller
Checking Tiller...
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been upgraded to the current version.
Happy Helming!
Waiting for rollout to finish: 1 of 2 updated replicas are available...
deployment "tiller-deploy" successfully rolled out
Client: &version.Version{SemVer:"v2.11.0", GitCommit:"2e55dbe1fdb5fdb96b75ff144a339489417b146b", GitTreeState:"clean"}
[debug] Created tunnel using local port: '41407'
[debug] SERVER: "127.0.0.1:41407"
Kubernetes: &version.Info{Major:"1", Minor:"10+", GitVersion:"v1.10.9-gke.5", GitCommit:"d776b4deeb3655fa4b8f4e8e7e4651d00c5f4a98", GitTreeState:"clean", BuildDate:"2018-11-08T20:33:00Z", GoVersion:"go1.9.3b4", Compiler:"gc", Platform:"linux/amd64"}
E0108 18:33:55.469814 180 portforward.go:331] an error occurred forwarding 41407 -> 44134: error forwarding port 44134 to pod d669389f83b9cd543403ae53c694337bb86e0cb061a9627c5067fcec7504c725, uid : exit status 1: 2019/01/08 18:33:54 socat[3111563] E connect(5, AF=2 127.0.0.1:44134, 16): Connection refused
Error: cannot connect to Tiller
[debug] rpc error: code = Unavailable desc = transport is closing
Failed to init Tiller.
error: watch closed before Until timeout
This can be seen in https://gitlab.com/gitlab-org/gitlab-ce/-/jobs/147315326.
$ install_tiller
Checking Tiller...
$HELM_HOME has been configured at /root/.helm.
Tiller (the Helm server-side component) has been upgraded to the current version.
Happy Helming!
Waiting for rollout to finish: 1 of 2 updated replicas are available...
Waiting for rollout to finish: 1 of 2 updated replicas are available...
error: watch closed before Until timeout
install_external_dns
During
error calling eq: invalid type for comparison
This can be seen in https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/144088273:
$ install_external_dns
Installing external-dns helm chart
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "gitlab" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
Error: render error in "external-dns/templates/secret.yaml": template: external-dns/templates/secret.yaml:14:12: executing "external-dns/templates/secret.yaml" at <eq .Values.provider ...>: error calling eq: invalid type for comparison
a release named dns-gitlab-review-app already exists
$ install_external_dns
Installing external-dns helm chart
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "gitlab" chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈
Error: a release named dns-gitlab-review-app already exists.
Run: helm ls --all dns-gitlab-review-app; to check the status of the release
Or run: helm del --purge dns-gitlab-review-app; to delete it
deploy
During
timed out waiting for the condition
This happens when the release doesn't exist yet, e.g. https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/112364180
Deploying with:
helm upgrade --install --wait --timeout 600 --set releaseOverride="review-dd-switch-jtxaql" --set global.hosts.hostSuffix="review-dd-switch-jtxaql" --set global.hosts.domain="gitlab-review.app" --set certmanager.install=false --set global.ingress.configureCertmanager=false --set global.ingress.tls.secretName=tls-cert --set global.ingress.annotations."external-dns\.alpha\.kubernetes\.io/ttl"="10" --set gitlab.unicorn.resources.requests.cpu=200m --set gitlab.sidekiq.resources.requests.cpu=100m --set gitlab.gitlab-shell.resources.requests.cpu=100m --set redis.resources.requests.cpu=100m --set minio.resources.requests.cpu=100m --set gitlab.migrations.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-rails-ee" --set gitlab.migrations.image.tag="dd-switch-rails-ee" --set gitlab.sidekiq.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-sidekiq-ee" --set gitlab.sidekiq.image.tag="dd-switch-rails-ee" --set gitlab.unicorn.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-unicorn-ee" --set gitlab.unicorn.image.tag="dd-switch-rails-ee" --set gitlab.gitaly.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitaly" --set gitlab.gitaly.image.tag="v0.121.0" --set gitlab.gitlab-shell.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-shell" --set gitlab.gitlab-shell.image.tag="v8.3.3" --set gitlab.unicorn.workhorse.image="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-workhorse-ee" --set gitlab.unicorn.workhorse.tag="dd-switch-rails-ee" --namespace="review-apps-ee" --version="34214393-112364180" "review-dd-switch-jtxaql" .
Release "review-dd-switch-jtxaql" does not exist. Installing it now.
Error: release review-dd-switch-jtxaql failed: timed out waiting for the condition
But also when the release already exists, e.g. https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/117780089:
Deploying with:
helm upgrade --install --wait --timeout 600 --set releaseOverride="review-winh-group-o7fvrj" --set global.hosts.hostSuffix="review-winh-group-o7fvrj" --set global.hosts.domain="gitlab-review.app" --set certmanager.install=false --set global.ingress.configureCertmanager=false --set global.ingress.tls.secretName=tls-cert --set global.ingress.annotations."external-dns\.alpha\.kubernetes\.io/ttl"="10" --set gitlab.unicorn.resources.requests.cpu=200m --set gitlab.sidekiq.resources.requests.cpu=100m --set gitlab.gitlab-shell.resources.requests.cpu=100m --set redis.resources.requests.cpu=100m --set minio.resources.requests.cpu=100m --set gitlab.migrations.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-rails-ee" --set gitlab.migrations.image.tag="winh-group_member_contributions-spec" --set gitlab.sidekiq.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-sidekiq-ee" --set gitlab.sidekiq.image.tag="winh-group_member_contributions-spec" --set gitlab.unicorn.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-unicorn-ee" --set gitlab.unicorn.image.tag="winh-group_member_contributions-spec" --set gitlab.gitaly.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitaly" --set gitlab.gitaly.image.tag="v0.128.0" --set gitlab.gitlab-shell.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-shell" --set gitlab.gitlab-shell.image.tag="v8.4.1" --set gitlab.unicorn.workhorse.image="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-workhorse-ee" --set gitlab.unicorn.workhorse.tag="winh-group_member_contributions-spec" --set nginx-ingress.controller.config.ssl-ciphers="ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4" --namespace="review-apps-ee" --version="35797710-117780089" "review-winh-group-o7fvrj" .
Error: UPGRADE FAILED: timed out waiting for the condition
In both cases, the problem seems to be the same: we hit the Helm timeout (set to 600
with --timeout 600
).
Based on some info I could find in https://gitlab.com/gitlab-org/gitlab-ce/issues/52112#note_109980053, one cause of that could be the following (quoting @ibaum):
Some quick digging this morning. The tiller pods for
review-apps-ee
are getting evicted periodically due to low memory on a node. Due to the load on the cluster, it can take a few minutes to respawn, and helm commands fail during this time.It looks like helm does support multiple replicas. But based on https://github.com/helm/helm/pull/3464 I'm not convinced it is safe to use in this instance.
Also, one useful command to debug such cases is kubectl -n review-apps-ee get pods -l name=tiller
(thanks @ibaum for sharing that!).
transport is closing
This can be seen in https://gitlab.com/gitlab-org/gitlab-ee/-/jobs/143050848:
helm upgrade --install --wait --timeout 600 --set global.appConfig.enableUsagePing=false --set releaseOverride="review-reinstate-u3zyo5" --set global.hosts.hostSuffix="review-reinstate-u3zyo5" --set global.hosts.domain="gitlab-review.app" --set certmanager.install=false --set global.ingress.configureCertmanager=false --set global.ingress.tls.secretName=tls-cert --set global.ingress.annotations."external-dns\.alpha\.kubernetes\.io/ttl"="10" --set gitlab.unicorn.resources.requests.cpu=200m --set gitlab.sidekiq.resources.requests.cpu=100m --set gitlab.gitlab-shell.resources.requests.cpu=100m --set redis.resources.requests.cpu=100m --set minio.resources.requests.cpu=100m --set gitlab.migrations.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-rails-ee" --set gitlab.migrations.image.tag="reinstate-cla-for-ee" --set gitlab.sidekiq.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-sidekiq-ee" --set gitlab.sidekiq.image.tag="reinstate-cla-for-ee" --set gitlab.unicorn.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-unicorn-ee" --set gitlab.unicorn.image.tag="reinstate-cla-for-ee" --set gitlab.task-runner.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-task-runner-ee" --set gitlab.task-runner.image.tag="reinstate-cla-for-ee" --set gitlab.gitaly.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitaly" --set gitlab.gitaly.image.tag="v1.12.0" --set gitlab.gitlab-shell.image.repository="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-shell" --set gitlab.gitlab-shell.image.tag="v8.4.4" --set gitlab.unicorn.workhorse.image="registry.gitlab.com/gitlab-org/build/cng-mirror/gitlab-workhorse-ee" --set gitlab.unicorn.workhorse.tag="reinstate-cla-for-ee" --set nginx-ingress.controller.config.ssl-ciphers="ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA:ECDHE-RSA-AES128-SHA:AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-SHA256:AES128-SHA256:AES256-SHA:AES128-SHA:!aNULL:!eNULL:!EXPORT:!DES:!MD5:!PSK:!RC4" --namespace="review-apps-ee" --version="42579845-143050848" "review-reinstate-u3zyo5" .
Release "review-reinstate-u3zyo5" does not exist. Installing it now.
Error: transport is closing