GitLab Memory Watchdog Is Not Restarting Puma

Summary

Self-managed customer.

Customer upgraded from GitLab 15.4.4 -> 15.6.2 and noticed this change in behaviour.

  • Puma Worker Killer had stopped functioning
  • Linux OOM events were now occurring

After reading through

It is expected that Puma Worker killer is no longer enabled and that monitoring will now perform a restart of the Puma process - checking every five minutes.

This restart is not occurring.

Steps to reproduce

  • Install GitLab 15.6.2
  • Configure gitlab.rb with a low puma['per_worker_max_memory_mb']
  • load pages in the GitLab UI with a large nymber of commits, branhces, compare distant git revisions to cause a high memory usage for the page load

Example output

Showing setting in gitlab.rb

# grep -Ei puma /etc/gitlab/gitlab.rb
## Puma
puma['per_worker_max_memory_mb'] = 400
#

Showing the setting is carried through to the Rails env

cat /opt/gitlab/etc/gitlab-rails/env/PUMA_WORKER_MAX_MEMORY
400

No more messages/restarts are seen, loaded various pages in GitLab

# tail -n 3 /var/log/gitlab/puma/puma_stdout.log
{"timestamp":"2023-01-04T00:57:36.887Z","pid":8322,"message":"! Friendly fork preparation complete."}
{"timestamp":"2023-01-04T00:57:37.944Z","pid":8322,"message":"- Worker 0 (PID: 8509) booted in 1.03s, phase: 0"}
{"timestamp":"2023-01-04T00:57:38.455Z","pid":8322,"message":"- Worker 1 (PID: 8511) booted in 1.5s, phase: 0"}

##
## load high memory use pages in the GitLab UI
## 40 mins passed and no Puma restarts
# date
Wed Jan  4 01:40:37 UTC 2023

Showing current Puma memory usage via htop


    0[|||||||||||                                                                                10.0%]   Tasks: 127, 402 thr; 1 running
    1[|||||||||||||                                                                              13.2%]   Load average: 0.30 0.64 1.00
  Mem[|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||5.71G/7.49G]   Uptime: 01:56:39
  Swp[                                                                                           0K/0K]

    PID USER      PRI  NI  VIRT   RES   SHR S CPU%▽MEM%   TIME+  Command
   8322 git        20   0 1434M  905M 24384 S  0.0 11.8  1:09.42 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
   8362 git        20   0 1434M  905M 24384 S  0.0 11.8  0:00.00 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
   8506 git        20   0 1434M  905M 24384 S  0.0 11.8  0:00.54 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
   8507 git        20   0 1434M  905M 24384 S  0.0 11.8  0:00.12 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]
   8508 git        20   0 1434M  905M 24384 S  0.0 11.8  0:00.26 puma 5.6.5 (unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket,tcp://127.0.0.1:8080) [gitlab-puma-worker]

Example Project

What is the current bug behavior?

Puma process is not restarted, in this customer's case it meant that Linux OOM was invoked.

What is the expected correct behavior?

Relevant logs and/or screenshots

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of:
`sudo gitlab-rake gitlab:env:info`)

(For installations from source run and paste the output of:
`sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

To reenable PumaWorkerKiller, add

gitlab_rails['env'] = {
   'GITLAB_MEMORY_WATCHDOG_ENABLED' => false
}

to gitlab.rb and sudo gitlab-ctl reconfigure