Skip to content

"context deadline exceeded" timeout error on some gitlab-helmfiles ci jobs

This problem has been coming up for some time now. It's not clear at this point in what scenarios the error happens (for example if it's exclusively when listing releases).

The global timeout for Helm is set at 300s: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/blob/master/bases/helmDefaults.yaml#L8 , but jobs time out after ~1m

Looking at a few failed jobs:

it seems that helm timed out when listing releases. Perhaps it's trying to establish a connection to a local instance of tiller, but tiller gets so many connections that it doesn't accept any more of them or is simply slower which leads to a timeout. When the pipelines run, there are certainly times when there are multiple jobs running simultaneously (each of them connecting to tiller), eg:

root     15334  6.0  0.0 143276 29356 ?        Sl   11:42   0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root     15361  0.0  0.0   2456  1924 ?        S    11:42   0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root     15404  5.5  0.1 143276 31144 ?        Sl   11:42   0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root     15410  6.0  0.0 143020 29612 ?        Sl   11:42   0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root     15431  0.0  0.0   2456  1960 ?        S    11:42   0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root     15445  129  0.6 266924 205468 ?       Sl   11:42   0:02 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values970951700 --values /tmp/values342699619 --values /tmp/values519303782 --values /tmp/values282580621 --values /tmp/values847654024 --values /tmp/values830328391 --suppress-secrets
root     15487  5.5  0.0 143020 29464 ?        Sl   11:42   0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root     15511  220  0.6 266988 203596 ?       Sl   11:42   0:02 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values348300592 --values /tmp/values565353679 --values /tmp/values040553442 --values /tmp/values439492569 --values /tmp/values782660452 --values /tmp/values206646131 --suppress-secrets
root     15610  8.0  0.0 143276 28804 ?        Sl   11:42   0:00 helm tiller run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root     15622  0.0  0.0   2456  1972 ?        S    11:42   0:00 bash /root/.helm/plugins/helm-tiller/scripts/tiller.sh run kube-system -- helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root     15664  0.0  0.0 143020 29060 ?        Sl   11:42   0:00 helm diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root     15684  0.0  0.2 128596 78344 ?        Sl   11:42   0:00 /root/.helm/plugins/helm-diff/bin/diff upgrade --reset-values --allow-unreleased gitlab-monitoring stable/prometheus-operator --version 8.15.8 --namespace monitoring --values /tmp/values187817421 --values /tmp/values806119624 --values /tmp/values663197575 --values /tmp/values491678522 --values /tmp/values727016785 --values /tmp/values736071804 --suppress-secrets
root     15706  0.0  0.0 143276 28196 ?        Sl   11:42   0:00 helm tiller run kube-system -- helm list ^cloudflare-exporter$ --deployed --failed --pending

At this point I don't know if this is ok, perhaps each job gets a dedicated tiller instance and this is just a red herring.

Edited by Michal Wasilewski