hpa downscaling webservice causing 502 errors with nginx-ingress
Summary
during webservice hpa downscaling we encounter 502 seeing in ingress logs as well as webservice container
Steps to reproduce
deploy gitlab helm chart v5.1.7 external redis external psql v12.7
Configuration used in values.yaml
global:
webservice:
enabled: true
ingress:
tls:
secretName: ingress-tls
annotations:
nginx.ingress.kubernetes.io/proxy-body-size: 3072m
gitlab:
webservice:
minReplicas: 3 # 502 suppose to fix
annotations:
log.config.scalyr.com/attributes.parser: "accessLog"
nodeSelector:
kops.k8s.io/instancegroup: ondemand
registry:
enabled: false
Current behavior
502 errors reported by users and runners seeing in ingress: logs
10.0.101.178 - - [26/Oct/2021:08:45:18 +0000] "GET /codezillas/runner-runtime/-/merge_requests/843.json?serializer=sidebar_extras HTTP/1.1" 502 2940 "https://gitlab.ourdomain.io/codezillas/runner-runtime/-/merge_requests/843" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36" 1174 0.002 [gitlab-gitlab-zoovu-webservice-default-8181] [] 100.70.224.11:8181 2940 0.000 502 6627662d680a4b8a458dd1a984b32f29
webservice hpa downscaling log
Normal SuccessfulRescale 51m (x93 over 48d) horizontal-pod-autoscaler New size: 4; reason: cpu resource above target
Normal SuccessfulRescale 45m (x159 over 48d) horizontal-pod-autoscaler New size: 3; reason: All metrics below target
webservice logs
correlation_id: '01FJXWCY3664RGJ3FNHBBKKKXR',
duration_ms: 0,
error: 'badgateway: failed to receive response: dial tcp 127.0.0.1:8080: connect: connection refused',
level: 'error',
method: 'GET',
msg: '',
time: '2021-10-26T08:45:18Z',
uri: '/codezillas/runner-runtime/-/merge_requests/843.json?serializer=sidebar_extras',
}
Expected behavior
do not send traffic to terminating webservices pods during hpa downscaling
Versions
- Chart: (v5.1.7)
- Platform:
- Cloud: (AWS)
- Self-hosted: (kops)
- Kubernetes: (
kubectl version
)- Client: 1.20.5
- Server: 1.20.8
- Helm: (
helm version
)- Client: 3.6.3
- Server: -
Relevant logs
see above
nginx configuration
location / {
set $namespace "gitlab";
set $ingress_name "gitlab-zoovu-webservice-default";
set $service_name "gitlab-zoovu-webservice-default";
set $service_port "8181";
set $location_path "/";
# Custom headers to proxied server
proxy_connect_timeout 15s;
proxy_send_timeout 60s;
proxy_read_timeout 600s;
# In case of errors try the next upstream server before returning an error
proxy_next_upstream error timeout;
proxy_next_upstream_timeout 0;
proxy_next_upstream_tries 3;
proxy_pass http://upstream_balancer;
webservice variables
INTERNAL_PORT: 8080
PUMA_THREADS_MIN: 4
PUMA_THREADS_MAX: 4
PUMA_WORKER_MAX_MEMORY: 1024
DISABLE_PUMA_WORKER_KILLER: false
SHUTDOWN_BLACKOUT_SECONDS: 10
Acceptance Criteria
-
Default settings for NGINX / Webservice are updated to values observed to address this problem -
Documentation is updated to reflect this
Edited by Jason Plum