GitLab Memory Watchdog Is Not Restarting Puma
Summary
Self-managed customer.
Customer upgraded from GitLab 15.4.4
-> 15.6.2
and noticed this change in behaviour.
- Puma Worker Killer had stopped functioning
- Linux OOM events were now occurring
After reading through
- Replace puma-worker-killer with memory-watchdog
- Set PUMA_WORKER_MAX_MEMORY env if per_worker_max_memory_mb is configured
- Convert memory_limit to bytes for RssMemoryLimit
It is expected that Puma Worker killer is no longer enabled and that monitoring will now perform a restart of the Puma process - checking every five minutes.
This restart is not occurring.
Steps to reproduce
- Install GitLab 15.6.2
- Configure
gitlab.rb
with a lowpuma['per_worker_max_memory_mb']
- load pages in the GitLab UI with a large nymber of commits, branhces, compare distant git revisions to cause a high memory usage for the page load
Example output
Showing setting in gitlab.rb
# grep -Ei puma /etc/gitlab/gitlab.rb
## Puma
puma['per_worker_max_memory_mb'] = 400
#
Showing the setting is carried through to the Rails env
cat /opt/gitlab/etc/gitlab-rails/env/PUMA_WORKER_MAX_MEMORY
400
No more messages/restarts are seen, loaded various pages in GitLab
# tail -n 3 /var/log/gitlab/puma/puma_stdout.log
{"timestamp":"2023-01-04T00:57:36.887Z","pid":8322,"message":"! Friendly fork preparation complete."}
{"timestamp":"2023-01-04T00:57:37.944Z","pid":8322,"message":"- Worker 0 (PID: 8509) booted in 1.03s, phase: 0"}
{"timestamp":"2023-01-04T00:57:38.455Z","pid":8322,"message":"- Worker 1 (PID: 8511) booted in 1.5s, phase: 0"}
##
## load high memory use pages in the GitLab UI
## 40 mins passed and no Puma restarts
# date
Wed Jan 4 01:40:37 UTC 2023
Showing current Puma memory usage via htop
0[||||||||||| 10.0%] Tasks: 127, 402 thr; 1 running
1[||||||||||||| 13.2%] Load average: 0.30 0.64 1.00
Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||5.71G/7.49G] Uptime: 01:56:39
Swp[ 0K/0K]
PID USER PRI NI VIRT RES SHR S CPU%▽MEM% TIME+ Command
8322 git 20 0 1434M 905M 24384 S 0.0 11.8 1:09.42 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
8362 git 20 0 1434M 905M 24384 S 0.0 11.8 0:00.00 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
8506 git 20 0 1434M 905M 24384 S 0.0 11.8 0:00.54 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
8507 git 20 0 1434M 905M 24384 S 0.0 11.8 0:00.12 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
8508 git 20 0 1434M 905M 24384 S 0.0 11.8 0:00.26 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
Example Project
What is the current bug behavior?
Puma process is not restarted, in this customer's case it meant that Linux OOM was invoked.
What is the expected correct behavior?
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
To reenable PumaWorkerKiller, add
gitlab_rails['env'] = {
'GITLAB_MEMORY_WATCHDOG_ENABLED' => false
}
to gitlab.rb
and sudo gitlab-ctl reconfigure