Readiness probe failing on webservice container following an attempt to bump Helm chart version
Summary
Deployment to Preprod failed after I merged the chart bump MR gitlab-com/gl-infra/k8s-workloads/gitlab-com!4538 (merged).
Changes in the chart bump MR: gitlab-org/charts/gitlab@779f351e...21461548
The changes contained in the chart bump MR included several changes related to support for IPv6 Kubernetes clusters (gitlab-org/charts/gitlab!4072 (merged)). Our GKE clusters for Gitlab.com and the preprod environment are IPv4 only.
- Deployment pipeline: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/pipelines/4669256
- Failing
pre:upgradejob: https://ops.gitlab.net/gitlab-com/gl-infra/k8s-workloads/gitlab-com/-/jobs/19079397
Observations
The new pod which was created for webservice- Deployments kept restarting repeatedly:
$ kr pod l webservice-api
kubectl 2>/dev/null get --no-headers pod | grep --color=never webservice-api
gitlab-webservice-api-55967f87cc-n9brm 1/2 Running 8 (4m20s ago) 44m
Every webservice pod has two containers, gitlab-workhorse and webservice. The workhorse container started up correctly. However, the pod's events show that the Readiness probe was failing for the webservice container:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
[snip]
Normal Created 44m kubelet Created container: gitlab-workhorse
Normal Started 44m kubelet Started container gitlab-workhorse
Warning Unhealthy 43m (x15 over 44m) kubelet Startup probe failed: HTTP probe failed with statuscode: 404
Normal Created 30m (x4 over 45m) kubelet Created container: webservice
Normal Started 30m (x4 over 44m) kubelet Started container webservice
Normal Pulled 25m (x5 over 45m) kubelet Container image "us-east1-docker.pkg.dev/gitlab-com-artifact-registry/images/gitlab-webservice-ee:v18.1.0-rc44" already present on machine
Warning Unhealthy 4m39s (x78 over 44m) kubelet Startup probe failed: Get "http://10.235.17.239:8080/-/readiness": dial tcp 10.235.17.239:8080: connect: connection refused
Normal Killing 12s (x9 over 40m) kubelet Container webservice failed startup probe, will be restarted
In the webservice container, we can see that the request is received but a 404 is returned for the /-/readiness API path (?)
{
"component": "gitlab",
"subcomponent": "production_json",
"method": "GET",
"path": "/-/readiness",
"format": "*/*",
"controller": "HealthController",
"action": "readiness",
"status": 404,
"time": "2025-06-30T07:14:09.944Z",
"params": [],
...
}
Looking at the related code, 404 is the standard empty response when the IP address that is requesting a health check is not in the gitlab.webservice.monitoring.ipWhitelist list. This is described in our IP Allowlist documentation.
For comparison, the logs from when a webservice-api pod started in preprod last time had this log with json.status set to 200 for the HealthController.
