"context deadline exceeded" timeout error on some gitlab-helmfiles ci jobs
This problem has been coming up for some time now. It's not clear at this point in what scenarios the error happens (for example if it's exclusively when listing releases).
The global timeout for Helm is set at 300s: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/blob/master/bases/helmDefaults.yaml#L8 , but jobs time out after ~1m
Looking at a few failed jobs:
- https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/jobs/1396310
- https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/jobs/1396034
- https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/jobs/1386824
it seems that helm timed out when listing releases. Perhaps it's trying to establish a connection to a local instance of tiller, but tiller gets so many connections that it doesn't accept any more of them or is simply slower which leads to a timeout. When the pipelines run, there are certainly times when there are multiple jobs running simultaneously (each of them connecting to tiller), eg:
root 15334 6.0 0.0 143276 29356 ? Sl 11:42 0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root 15361 0.0 0.0 2456 1924 ? S 11:42 0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root 15404 5.5 0.1 143276 31144 ? Sl 11:42 0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root 15410 6.0 0.0 143020 29612 ? Sl 11:42 0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root 15431 0.0 0.0 2456 1960 ? S 11:42 0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root 15445 129 0.6 266924 205468 ? Sl 11:42 0:02 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root 15487 5.5 0.0 143020 29464 ? Sl 11:42 0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root 15511 220 0.6 266988 203596 ? Sl 11:42 0:02 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root 15610 8.0 0.0 143276 28804 ? Sl 11:42 0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root 15622 0.0 0.0 2456 1972 ? S 11:42 0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root 15664 0.0 0.0 143020 29060 ? Sl 11:42 0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root 15684 0.0 0.2 128596 78344 ? Sl 11:42 0:00 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root 15706 0.0 0.0 143276 28196 ? Sl 11:42 0:00 helm tiller run kube-system -- helm list ^cloudflare-exporter$ --deployed --failed --pending
At this point I don't know if this is ok, perhaps each job gets a dedicated tiller instance and this is just a red herring.