Readiness probe failing on webservice container following an attempt to bump Helm chart version

Summary

Deployment to Preprod failed after I merged the chart bump MR gitlab-com/gl-infra/k8s-workloads/gitlab-com!4538 (merged).

Changes in the chart bump MR: gitlab-org/charts/gitlab@779f351e...21461548

The changes contained in the chart bump MR included several changes related to support for IPv6 Kubernetes clusters (gitlab-org/charts/gitlab!4072 (merged)). Our GKE clusters for Gitlab.com and the preprod environment are IPv4 only.

Observations

The new pod which was created for webservice- Deployments kept restarting repeatedly:

$ kr pod l webservice-api
kubectl 2>/dev/null get --no-headers pod | grep --color=never webservice-api
gitlab-webservice-api-55967f87cc-n9brm             1/2   Running     8 (4m20s ago)    44m

Every webservice pod has two containers, gitlab-workhorse and webservice. The workhorse container started up correctly. However, the pod's events show that the Readiness probe was failing for the webservice container:

Events:
  Type     Reason                Age                   From                     Message
  ----     ------                ----                  ----                     -------
[snip]
  Normal   Created               44m                   kubelet                  Created container: gitlab-workhorse
  Normal   Started               44m                   kubelet                  Started container gitlab-workhorse
  Warning  Unhealthy             43m (x15 over 44m)    kubelet                  Startup probe failed: HTTP probe failed with statuscode: 404
  Normal   Created               30m (x4 over 45m)     kubelet                  Created container: webservice
  Normal   Started               30m (x4 over 44m)     kubelet                  Started container webservice
  Normal   Pulled                25m (x5 over 45m)     kubelet                  Container image "us-east1-docker.pkg.dev/gitlab-com-artifact-registry/images/gitlab-webservice-ee:v18.1.0-rc44" already present on machine
  Warning  Unhealthy             4m39s (x78 over 44m)  kubelet                  Startup probe failed: Get "http://10.235.17.239:8080/-/readiness": dial tcp 10.235.17.239:8080: connect: connection refused
  Normal   Killing               12s (x9 over 40m)     kubelet                  Container webservice failed startup probe, will be restarted

In the webservice container, we can see that the request is received but a 404 is returned for the /-/readiness API path (?)

{
  "component": "gitlab",
  "subcomponent": "production_json",
  "method": "GET",
  "path": "/-/readiness",
  "format": "*/*",
  "controller": "HealthController",
  "action": "readiness",
  "status": 404,
  "time": "2025-06-30T07:14:09.944Z",
  "params": [],
  ...
}

Looking at the related code, 404 is the standard empty response when the IP address that is requesting a health check is not in the gitlab.webservice.monitoring.ipWhitelist list. This is described in our IP Allowlist documentation.

For comparison, the logs from when a webservice-api pod started in preprod last time had this log with json.status set to 200 for the HealthController.

image

source

Edited by Siddharth Kannan