fix: gitlab-workhorse graceful termination (!972) · Merge requests · GitLab.org / Build / CNG

What does this MR do?

What

Move gitlab-workhorse process to PID 1 via use of exec and array-form specification of CMD
Add feature flag to set gitlab-workhorse in PID 1. Off by default, and will be removed after successful rollout in GitLab.com

Why

GitLab-workhorse supports graceful termination, however, we are not using it, this causes the gitlab-workhorse pod to run for 30s on shutdown responding with 502 and then receiving SIGKILL.

By default Kubernetes sends SIGTERM to PID 1 in the container, and workhorse listens for this signal however workhorse is not PID 1 as seen in the process tree below, this is because of 2 reasons:

CMD isn't passed as an array. https://docs.docker.com/engine/reference/builder/#cmd in it specifies: CMD command param1 param2 (shell form) so this sets sh as PID 1.
A shell script is invoked which creates gitlab-workhorse as a child process.

Process tree before:

git@gitlab-webservice-default-5d85b6854c-sbx2z:/$ ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root        1015  0.0  0.0 805036  4588 ?        Rsl  13:12   0:00 runc init
git         1005  0.3  0.0   5992  3784 pts/0    Ss   13:12   0:00 bash
git         1014  0.0  0.0   8592  3364 pts/0    R+   13:12   0:00  \_ ps faux
git            1  0.0  0.0   2420   532 ?        Ss   12:52   0:00 /bin/sh -c /scripts/start-workhorse
git           16  0.0  0.0   5728  3408 ?        S    12:52   0:00 /bin/bash /scripts/start-workhorse
git           19  0.0  0.3 1328480 33080 ?       Sl   12:52   0:00  \_ gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml

Process tree after (with GITLAB_WORKHORSE_EXEC set):

git@gitlab-webservice-default-84c68fc9c9-dzfd4:/$ ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
git          103  0.5  0.0   5992  3812 pts/0    Ss   07:33   0:00 bash
git          111  0.0  0.0   8592  3172 pts/0    R+   07:33   0:00  \_ ps faux
git            1  0.1  0.3 1254496 32120 ?       Ssl  07:32   0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml

Put this behind a feature flag to see if this causes any problem and we can slowly roll this out.

Testing

`GITLAB_WORKHORSE_EXEC` not defined

values.yml

---
# values-minikube.yaml
# This example intended as baseline to use Minikube for the deployment of GitLab
# - Minimized CPU/Memory load, can fit into 3 CPU, 6 GB of RAM (barely)
# - Services that are not compatible with how Minikube runs are disabled
# - Some services entirely removed, or scaled down to 1 replica.
# - Configured to use 192.168.99.100, and nip.io for the domain

# Minimal settings
global:
  ingress:
    configureCertmanager: false
    class: nginx
  hosts:
    domain: 192.168.99.100.nip.io
    externalIP: 192.168.99.100
  # Disable Rails bootsnap cache (temporary)
  rails:
    bootsnap:
      enabled: false
  shell:
    # Configure the clone link in the UI to include the high-numbered NodePort
    # value from below (`gitlab.gitlab-shell.service.nodePort`)
    port: 32022
# Don't use certmanager, we'll self-sign
certmanager:
  install: false
# Use the `ingress` addon, not our Ingress (can't map 22/80/443)
nginx-ingress:
  enabled: false
# Save resources, only 3 CPU
prometheus:
  install: false
gitlab-runner:
  install: false
# Reduce replica counts, reducing CPU & memory requirements
gitlab:
  webservice:
    minReplicas: 1
    maxReplicas: 1
  sidekiq:
    minReplicas: 1
    maxReplicas: 1
  gitlab-shell:
    minReplicas: 1
    maxReplicas: 1
    # Map gitlab-shell to a high-numbered NodePort to support cloning over SSH since
    # Minikube takes port 22.
    service:
      type: NodePort
      nodePort: 32022
registry:
  hpa:
    minReplicas: 1
    maxReplicas: 1

Deploy helmchart

helm upgrade gitlab . --timeout 600s -f values.yaml  --set global.hosts.domain=$(minikube ip).nip.io  --set 
global.hosts.externalIP=$(minikube ip) --set gitlab.webservice.workhorse.image=registry.gitlab.com/gitlab- 
org/build/cng/gitlab-workhorse-ee --set gitlab.webservice.workhorse.tag=fix-workhorse-sigterm

Check that process tree that GitLab-workhorse is not PID 1.

$ kubectl exec $(kubectl get po -l app=webservice -o jsonpath='{..metadata.name}') -c gitlab-workhorse -- ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
git          101  0.0  0.0   8592  3172 ?        Rs   09:35   0:00 ps faux
git            1  0.0  0.0   5728  3464 ?        Ss   09:34   0:00 /bin/bash /scripts/start-workhorse
git           19  0.0  0.3 1327968 32256 ?       Sl   09:34   0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml

`GITLAB_WORKHORSE_EXEC` defined

values.yml

---
# values-minikube.yaml
# This example intended as baseline to use Minikube for the deployment of GitLab
# - Minimized CPU/Memory load, can fit into 3 CPU, 6 GB of RAM (barely)
# - Services that are not compatible with how Minikube runs are disabled
# - Some services entirely removed, or scaled down to 1 replica.
# - Configured to use 192.168.99.100, and nip.io for the domain

# Minimal settings
global:
  ingress:
    configureCertmanager: false
    class: nginx
  hosts:
    domain: 192.168.99.100.nip.io
    externalIP: 192.168.99.100
  # Disable Rails bootsnap cache (temporary)
  rails:
    bootsnap:
      enabled: false
  shell:
    # Configure the clone link in the UI to include the high-numbered NodePort
    # value from below (`gitlab.gitlab-shell.service.nodePort`)
    port: 32022
# Don't use certmanager, we'll self-sign
certmanager:
  install: false
# Use the `ingress` addon, not our Ingress (can't map 22/80/443)
nginx-ingress:
  enabled: false
# Save resources, only 3 CPU
prometheus:
  install: false
gitlab-runner:
  install: false
# Reduce replica counts, reducing CPU & memory requirements
gitlab:
  webservice:
    minReplicas: 1
    maxReplicas: 1
    extraEnv:
      GITLAB_WORKHORSE_EXEC: 1
  sidekiq:
    minReplicas: 1
    maxReplicas: 1
  gitlab-shell:
    minReplicas: 1
    maxReplicas: 1
    # Map gitlab-shell to a high-numbered NodePort to support cloning over SSH since
    # Minikube takes port 22.
    service:
      type: NodePort
      nodePort: 32022
registry:
  hpa:
    minReplicas: 1
    maxReplicas: 1

Deploy helmchart

helm upgrade gitlab . --timeout 600s -f values.yaml  --set global.hosts.domain=$(minikube ip).nip.io  --set 
global.hosts.externalIP=$(minikube ip) --set gitlab.webservice.workhorse.image=registry.gitlab.com/gitlab- 
org/build/cng/gitlab-workhorse-ee --set gitlab.webservice.workhorse.tag=fix-workhorse-sigterm

Check the process tree, GitLab-workhorse should be PID 1

$ kubectl exec $(kubectl get po -l app=webservice -o jsonpath='{..metadata.name}') -c gitlab-workhorse -- ps faux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
git          613  0.0  0.0   8592  3212 ?        Rs   09:32   0:00 ps faux
git          597  0.0  0.0   5992  3812 pts/0    Ss+  09:32   0:00 bash
git            1  0.0  0.3 1328224 33028 ?       Ssl  09:20   0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml

Checklist

See Definition of done.

For anything in this list which will not be completed, please provide a reason in the MR discussion

Required

Merge Request Title, and Description are up to date, accurate, and descriptive
MR targeting the appropriate branch
MR has a green pipeline on GitLab.com

Expected (please provide an explanation if not completing)

Test plan indicating conditions for success has been posted and passes
Documentation created/updated
Integration tests added to GitLab QA
The impact any change in container size has should be evaluated

Edited Apr 18, 2022 by Jason Plum

fix: gitlab-workhorse graceful termination

What does this MR do?

What

Why

Testing

GITLAB_WORKHORSE_EXEC not defined

GITLAB_WORKHORSE_EXEC defined

Related issues

Checklist

Required

Expected (please provide an explanation if not completing)

Merge request reports

`GITLAB_WORKHORSE_EXEC` not defined

`GITLAB_WORKHORSE_EXEC` defined