Skip to content

hpa downscaling webservice causing 502 errors with nginx-ingress

Summary

during webservice hpa downscaling we encounter 502 seeing in ingress logs as well as webservice container

Steps to reproduce

deploy gitlab helm chart v5.1.7 external redis external psql v12.7

Configuration used in values.yaml

global:
 webservice:
    enabled: true
    ingress:
      tls:
        secretName: ingress-tls
      annotations:
        nginx.ingress.kubernetes.io/proxy-body-size: 3072m
gitlab:
   webservice:
    minReplicas: 3 # 502 suppose to fix
    annotations:
      log.config.scalyr.com/attributes.parser: "accessLog"
    nodeSelector:
      kops.k8s.io/instancegroup: ondemand
    registry:
      enabled: false

Current behavior

502 errors reported by users and runners seeing in ingress: logs

10.0.101.178 - - [26/Oct/2021:08:45:18 +0000] "GET /codezillas/runner-runtime/-/merge_requests/843.json?serializer=sidebar_extras HTTP/1.1" 502 2940 "https://gitlab.ourdomain.io/codezillas/runner-runtime/-/merge_requests/843" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36" 1174 0.002 [gitlab-gitlab-zoovu-webservice-default-8181] [] 100.70.224.11:8181 2940 0.000 502 6627662d680a4b8a458dd1a984b32f29

webservice hpa downscaling log

Normal   SuccessfulRescale        51m (x93 over 48d)       horizontal-pod-autoscaler  New size: 4; reason: cpu resource above target
  Normal   SuccessfulRescale        45m (x159 over 48d)      horizontal-pod-autoscaler  New size: 3; reason: All metrics below target

webservice logs

  correlation_id: '01FJXWCY3664RGJ3FNHBBKKKXR',
  duration_ms:    0,
  error:          'badgateway: failed to receive response: dial tcp 127.0.0.1:8080: connect: connection refused',
  level:          'error',
  method:         'GET',
  msg:            '',
  time:           '2021-10-26T08:45:18Z',
  uri:            '/codezillas/runner-runtime/-/merge_requests/843.json?serializer=sidebar_extras',
}

Expected behavior

do not send traffic to terminating webservices pods during hpa downscaling

Versions

  • Chart: (v5.1.7)
  • Platform:
    • Cloud: (AWS)
    • Self-hosted: (kops)
  • Kubernetes: (kubectl version)
    • Client: 1.20.5
    • Server: 1.20.8
  • Helm: (helm version)
    • Client: 3.6.3
    • Server: -

Relevant logs

see above

nginx configuration

location / {
			
			set $namespace      "gitlab";
			set $ingress_name   "gitlab-zoovu-webservice-default";
			set $service_name   "gitlab-zoovu-webservice-default";
			set $service_port   "8181";
			set $location_path  "/";
# Custom headers to proxied server
			
			proxy_connect_timeout                   15s;
			proxy_send_timeout                      60s;
			proxy_read_timeout                      600s;
			# In case of errors try the next upstream server before returning an error
			proxy_next_upstream                     error timeout;
			proxy_next_upstream_timeout             0;
			proxy_next_upstream_tries               3;
			
			proxy_pass http://upstream_balancer;

webservice variables

      INTERNAL_PORT:                     8080
      PUMA_THREADS_MIN:                  4
      PUMA_THREADS_MAX:                  4
      PUMA_WORKER_MAX_MEMORY:            1024
      DISABLE_PUMA_WORKER_KILLER:        false
      SHUTDOWN_BLACKOUT_SECONDS:         10

Acceptance Criteria

  • Default settings for NGINX / Webservice are updated to values observed to address this problem
  • Documentation is updated to reflect this
Edited by Jason Plum