fix: gitlab-workhorse graceful termination
What does this MR do?
What
- Move
gitlab-workhorse
process to PID 1 via use ofexec
and array-form specification ofCMD
- Add feature flag to set
gitlab-workhorse
in PID 1. Off by default, and will be removed after successful rollout in GitLab.com
Why
GitLab-workhorse supports graceful termination,
however, we are not using it,
this causes the gitlab-workhorse
pod to run for 30s
on shutdown
responding with 502
and then receiving SIGKILL
.
By default Kubernetes sends SIGTERM
to PID 1
in the container, and
workhorse listens for this
signal
however workhorse is not PID
1 as seen in the process tree below, this
is because of 2 reasons:
-
CMD
isn't passed as an array. https://docs.docker.com/engine/reference/builder/#cmd in it specifies:CMD command param1 param2 (shell form)
so this setssh
as PID 1. - A shell script is
invoked
which creates
gitlab-workhorse
as a child process.
Process tree before:
git@gitlab-webservice-default-5d85b6854c-sbx2z:/$ ps faux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1015 0.0 0.0 805036 4588 ? Rsl 13:12 0:00 runc init
git 1005 0.3 0.0 5992 3784 pts/0 Ss 13:12 0:00 bash
git 1014 0.0 0.0 8592 3364 pts/0 R+ 13:12 0:00 \_ ps faux
git 1 0.0 0.0 2420 532 ? Ss 12:52 0:00 /bin/sh -c /scripts/start-workhorse
git 16 0.0 0.0 5728 3408 ? S 12:52 0:00 /bin/bash /scripts/start-workhorse
git 19 0.0 0.3 1328480 33080 ? Sl 12:52 0:00 \_ gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml
Process tree after (with GITLAB_WORKHORSE_EXEC
set):
git@gitlab-webservice-default-84c68fc9c9-dzfd4:/$ ps faux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
git 103 0.5 0.0 5992 3812 pts/0 Ss 07:33 0:00 bash
git 111 0.0 0.0 8592 3172 pts/0 R+ 07:33 0:00 \_ ps faux
git 1 0.1 0.3 1254496 32120 ? Ssl 07:32 0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml
Put this behind a feature flag to see if this causes any problem and we can slowly roll this out.
Testing
GITLAB_WORKHORSE_EXEC
not defined
values.yml
---
# values-minikube.yaml
# This example intended as baseline to use Minikube for the deployment of GitLab
# - Minimized CPU/Memory load, can fit into 3 CPU, 6 GB of RAM (barely)
# - Services that are not compatible with how Minikube runs are disabled
# - Some services entirely removed, or scaled down to 1 replica.
# - Configured to use 192.168.99.100, and nip.io for the domain
# Minimal settings
global:
ingress:
configureCertmanager: false
class: nginx
hosts:
domain: 192.168.99.100.nip.io
externalIP: 192.168.99.100
# Disable Rails bootsnap cache (temporary)
rails:
bootsnap:
enabled: false
shell:
# Configure the clone link in the UI to include the high-numbered NodePort
# value from below (`gitlab.gitlab-shell.service.nodePort`)
port: 32022
# Don't use certmanager, we'll self-sign
certmanager:
install: false
# Use the `ingress` addon, not our Ingress (can't map 22/80/443)
nginx-ingress:
enabled: false
# Save resources, only 3 CPU
prometheus:
install: false
gitlab-runner:
install: false
# Reduce replica counts, reducing CPU & memory requirements
gitlab:
webservice:
minReplicas: 1
maxReplicas: 1
sidekiq:
minReplicas: 1
maxReplicas: 1
gitlab-shell:
minReplicas: 1
maxReplicas: 1
# Map gitlab-shell to a high-numbered NodePort to support cloning over SSH since
# Minikube takes port 22.
service:
type: NodePort
nodePort: 32022
registry:
hpa:
minReplicas: 1
maxReplicas: 1
- Deploy helmchart
helm upgrade gitlab . --timeout 600s -f values.yaml --set global.hosts.domain=$(minikube ip).nip.io --set global.hosts.externalIP=$(minikube ip) --set gitlab.webservice.workhorse.image=registry.gitlab.com/gitlab- org/build/cng/gitlab-workhorse-ee --set gitlab.webservice.workhorse.tag=fix-workhorse-sigterm
- Check that process tree that
GitLab-workhorse
is not PID 1.$ kubectl exec $(kubectl get po -l app=webservice -o jsonpath='{..metadata.name}') -c gitlab-workhorse -- ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND git 101 0.0 0.0 8592 3172 ? Rs 09:35 0:00 ps faux git 1 0.0 0.0 5728 3464 ? Ss 09:34 0:00 /bin/bash /scripts/start-workhorse git 19 0.0 0.3 1327968 32256 ? Sl 09:34 0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml
GITLAB_WORKHORSE_EXEC
defined
values.yml
---
# values-minikube.yaml
# This example intended as baseline to use Minikube for the deployment of GitLab
# - Minimized CPU/Memory load, can fit into 3 CPU, 6 GB of RAM (barely)
# - Services that are not compatible with how Minikube runs are disabled
# - Some services entirely removed, or scaled down to 1 replica.
# - Configured to use 192.168.99.100, and nip.io for the domain
# Minimal settings
global:
ingress:
configureCertmanager: false
class: nginx
hosts:
domain: 192.168.99.100.nip.io
externalIP: 192.168.99.100
# Disable Rails bootsnap cache (temporary)
rails:
bootsnap:
enabled: false
shell:
# Configure the clone link in the UI to include the high-numbered NodePort
# value from below (`gitlab.gitlab-shell.service.nodePort`)
port: 32022
# Don't use certmanager, we'll self-sign
certmanager:
install: false
# Use the `ingress` addon, not our Ingress (can't map 22/80/443)
nginx-ingress:
enabled: false
# Save resources, only 3 CPU
prometheus:
install: false
gitlab-runner:
install: false
# Reduce replica counts, reducing CPU & memory requirements
gitlab:
webservice:
minReplicas: 1
maxReplicas: 1
extraEnv:
GITLAB_WORKHORSE_EXEC: 1
sidekiq:
minReplicas: 1
maxReplicas: 1
gitlab-shell:
minReplicas: 1
maxReplicas: 1
# Map gitlab-shell to a high-numbered NodePort to support cloning over SSH since
# Minikube takes port 22.
service:
type: NodePort
nodePort: 32022
registry:
hpa:
minReplicas: 1
maxReplicas: 1
-
Deploy helmchart
helm upgrade gitlab . --timeout 600s -f values.yaml --set global.hosts.domain=$(minikube ip).nip.io --set global.hosts.externalIP=$(minikube ip) --set gitlab.webservice.workhorse.image=registry.gitlab.com/gitlab- org/build/cng/gitlab-workhorse-ee --set gitlab.webservice.workhorse.tag=fix-workhorse-sigterm
-
Check the process tree,
GitLab-workhorse
should be PID 1$ kubectl exec $(kubectl get po -l app=webservice -o jsonpath='{..metadata.name}') -c gitlab-workhorse -- ps faux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND git 613 0.0 0.0 8592 3212 ? Rs 09:32 0:00 ps faux git 597 0.0 0.0 5992 3812 pts/0 Ss+ 09:32 0:00 bash git 1 0.0 0.3 1328224 33028 ? Ssl 09:20 0:00 gitlab-workhorse -logFile stdout -logFormat json -listenAddr 0.0.0.0:8181 -documentRoot /srv/gitlab/public -secretPath /etc/gitlab/gitlab-workhorse/secret -config /srv/gitlab/config/workhorse-config.toml
Related issues
https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/15497#note_910106152
Checklist
See Definition of done.
For anything in this list which will not be completed, please provide a reason in the MR discussion
Required
-
Merge Request Title, and Description are up to date, accurate, and descriptive -
MR targeting the appropriate branch -
MR has a green pipeline on GitLab.com
Expected (please provide an explanation if not completing)
-
Test plan indicating conditions for success has been posted and passes -
Documentation created/updated -
Integration tests added to GitLab QA -
The impact any change in container size has should be evaluated
Merge request reports
Activity
1 Message Please add the workflowready for review label once you think the MR is ready to for an initial review. If from a community member, ask that the Community contribution label be added as well.
Merge requests are handled according to the workflow documented in our handbook and should receive a response within the limit documented in our First-response SLO.
If you don't receive a response, please mention
@gitlab-org/distribution
, or one of our Project MaintainersGenerated by
Dangeradded devopssystems groupdistribution labels
added sectioncore platform label
added 11 commits
-
6e6b20e0...92d3e22e - 10 commits from branch
master
- 9a3eda90 - fix: gitlab-workhorse graceful termination
-
6e6b20e0...92d3e22e - 10 commits from branch
marked the checklist item Integration tests added to GitLab QA as completed
marked the checklist item Integration tests added to GitLab QA as incomplete
marked the checklist item Integration tests added to GitLab QA as completed
- Resolved by Steve Xuereb
added workflowready for review label
mentioned in issue gitlab-org/charts/gitlab#3249 (closed)
I've also opened gitlab-org/charts/gitlab#3249 (closed) to follow up on other containers that have similar problems
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@580b58f8
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@e942d5c3
mentioned in merge request gitlab-com/gl-infra/k8s-workloads/gitlab-com!1714 (merged)
- Resolved by Steve Xuereb
changed milestone to %15.0
added workflowin review label and removed workflowready for review label
requested review from @WarheadsSE
@WarheadsSE this should be ready for another round of review, mind if you take another look
mentioned in commit 97e9a64f
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@01b33994
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@fd25a5a7
mentioned in merge request gitlab-com/gl-infra/k8s-workloads/gitlab-com!1720 (merged)
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@34af7443
mentioned in merge request gitlab-com/gl-infra/k8s-workloads/gitlab-com!1721 (merged)
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@fa82dd01
mentioned in merge request gitlab-com/gl-infra/k8s-workloads/gitlab-com!1728 (merged)
mentioned in commit 54c6a0e4
mentioned in merge request !986 (merged)
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@8c977deb
mentioned in merge request gitlab-com/gl-infra/k8s-workloads/gitlab-com!1732 (merged)
mentioned in commit gitlab-com/gl-infra/k8s-workloads/gitlab-com@4db95058
added typebug label