Unicorn keeps running (100% CPU load on Puma) in GitLab v13
Summary
Unicorn process kept active which causes very high CPU load (caused by puma) on the server AND runsv unicorn
is still running (and keeps starting after gitlab restart).
Steps to reproduce
- Upgrade to version 13.x.x for GitLab (jeej!), but...
Optionally (didn't help):
gitlab-ctl reconfigure
gitlab-ctl restart
What is the current bug behavior?
Two related things:
- First I seems that Unicorn was still running after the GitLab upgrade and GitLab restart manually didn't help either. This actually caused very high CPU usage (100% on 1 core) of the new puna services for some reason. Workaround was manually killing the unicorn process(es), from that moment on the puma process CPU usage went to normal again luckily. But Unicorn keep starting up again!
- I still see a
runsv unicorn
process running (which keep starting up after agitlab-ctl restart
), plus logging proceses for unicorn. Q: Why is this needed?
What is the expected correct behavior?
- No regression impact during upgrading (no high CPU usages or processes that should no longer be ran, should be terminated automatically)
- Do not run unnecessary processes
Relevant logs and/or screenshots
Unicorn process keep starting up causing 100% CPU load of the puma process:
Htop Screenshot of the runsv process that should not be here - AFAIK:
Some developers said I needed to run this command, so I did:
$ lsof /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bundle 9889 git 14u unix 0x000000003c6b9b8a 0t0 2941588 /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket type=STREAM
bundle 10150 git 14u unix 0x000000003c6b9b8a 0t0 2941588 /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket type=STREAM
bundle 10164 git 14u unix 0x000000003c6b9b8a 0t0 2941588 /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket type=STREAM
bundle 10177 git 14u unix 0x000000003c6b9b8a 0t0 2941588 /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket type=STREAM
Some logging, during the high CPU usage of puma (before I killed Unicorn processes):
==> /var/log/gitlab/puma/current <==
2020-05-25_22:06:18.60941 bundler: failed to load command: puma (/opt/gitlab/embedded/bin/puma)
2020-05-25_22:06:18.60950 RuntimeError: There is already a server bound to: /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
2020-05-25_22:06:18.60952 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:323:in `add_unix_listener'
2020-05-25_22:06:18.60953 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:151:in `block in parse'
2020-05-25_22:06:18.60954 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:90:in `each'
2020-05-25_22:06:18.60956 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:90:in `parse'
2020-05-25_22:06:18.60957 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/runner.rb:161:in `load_and_bind'
2020-05-25_22:06:18.60958 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/cluster.rb:413:in `run'
2020-05-25_22:06:18.60959 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/launcher.rb:172:in `run'
2020-05-25_22:06:18.60960 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/cli.rb:80:in `run'
2020-05-25_22:06:18.60969 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/bin/puma:10:in `<top (required)>'
2020-05-25_22:06:18.60972 /opt/gitlab/embedded/bin/puma:23:in `load'
2020-05-25_22:06:18.60973 /opt/gitlab/embedded/bin/puma:23:in `<top (required)>'
2020-05-25_22:06:19.87626 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"Puma starting in cluster mode..."}
2020-05-25_22:06:19.87650 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"* Version 4.3.3.gitlab.2 (ruby 2.6.6-p146), codename: Mysterious Traveller"}
2020-05-25_22:06:19.87660 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"* Min threads: 4, max threads: 4"}
2020-05-25_22:06:19.87669 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"* Environment: production"}
2020-05-25_22:06:19.87678 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"* Process workers: 3"}
2020-05-25_22:06:19.87686 {"timestamp":"2020-05-25T22:06:19.876Z","pid":11937,"message":"* Preloading application"}
2020-05-25_22:07:15.31323 bundler: failed to load command: puma (/opt/gitlab/embedded/bin/puma)
2020-05-25_22:07:15.31332 RuntimeError: There is already a server bound to: /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
2020-05-25_22:07:15.31333 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:323:in `add_unix_listener'
2020-05-25_22:07:15.31335 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:151:in `block in parse'
2020-05-25_22:07:15.31336 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:90:in `each'
2020-05-25_22:07:15.31337 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/binder.rb:90:in `parse'
2020-05-25_22:07:15.31338 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/runner.rb:161:in `load_and_bind'
2020-05-25_22:07:15.31339 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/cluster.rb:413:in `run'
2020-05-25_22:07:15.31340 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/launcher.rb:172:in `run'
2020-05-25_22:07:15.31341 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/lib/puma/cli.rb:80:in `run'
2020-05-25_22:07:15.31342 /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/gitlab-puma-4.3.3.gitlab.2/bin/puma:10:in `<top (required)>'
2020-05-25_22:07:15.31344 /opt/gitlab/embedded/bin/puma:23:in `load'
2020-05-25_22:07:15.31345 /opt/gitlab/embedded/bin/puma:23:in `<top (required)>'
2020-05-25_22:07:16.58223 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"Puma starting in cluster mode..."}
2020-05-25_22:07:16.58232 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"* Version 4.3.3.gitlab.2 (ruby 2.6.6-p146), codename: Mysterious Traveller"}
2020-05-25_22:07:16.58233 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"* Min threads: 4, max threads: 4"}
2020-05-25_22:07:16.58235 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"* Environment: production"}
2020-05-25_22:07:16.58237 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"* Process workers: 3"}
2020-05-25_22:07:16.58238 {"timestamp":"2020-05-25T22:07:16.582Z","pid":12331,"message":"* Preloading application"}
Interesting log about unicorn log:
==> /var/log/gitlab/unicorn/current <==
2020-05-27_23:05:58.12177 Waiting for unicorn to die...
2020-05-27_23:05:59.14072 forwarding CONT to unicorn master 4304
2020-05-27_23:05:59.14076 /opt/gitlab/embedded/bin/gitlab-unicorn-wrapper: line 132: kill: (4304) - No such process
2020-05-27_23:05:59.16733 wrapper for unicorn master 4304 exiting
2020-05-27_23:07:25.03864 starting new unicorn master
2020-05-27_23:07:55.60754 starting new unicorn master
2020-05-27_23:09:00.51976 adopted new unicorn master 4041
2020-05-27_23:18:18.86107 Received TERM from runit, sending to process group (-PID)
2020-05-27_23:18:19.10597 forwarding CONT to unicorn master 4041
2020-05-27_23:22:52.01631 wrapper for unicorn master 4041 exiting
...
2020-05-30_22:20:13.82280 starting new unicorn master
2020-05-30_22:21:21.59050 adopted new unicorn master 18953
And the stderr unicorn log:
==> /var/log/gitlab/unicorn/unicorn_stderr.log <==
I, [2020-05-30T22:20:15.715Z #18953] INFO -- : Refreshing Gem list
I, [2020-05-30T22:21:20.367Z #18953] INFO -- : listening on addr=127.0.0.1:8080 fd=17
I, [2020-05-30T22:21:20.367Z #18953] INFO -- : unlinking existing socket=/var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
I, [2020-05-30T22:21:20.367Z #18953] INFO -- : listening on addr=/var/opt/gitlab/gitlab-rails/sockets/gitlab.socket fd=18
I, [2020-05-30T22:21:20.488Z #19284] INFO -- : worker=0 ready
I, [2020-05-30T22:21:20.508Z #19286] INFO -- : worker=1 ready
I, [2020-05-30T22:21:20.520Z #19288] INFO -- : worker=2 ready
I, [2020-05-30T22:21:20.539Z #19290] INFO -- : worker=3 ready
I, [2020-05-30T22:21:20.552Z #18953] INFO -- : master process ready
I, [2020-05-30T22:21:20.556Z #19292] INFO -- : worker=4 ready
Puma logging:
==> /var/log/gitlab/puma/current <==
2020-05-27_23:33:01.14277 {"timestamp":"2020-05-27T23:33:01.142Z","pid":9889,"message":"* Environment: production"}
2020-05-27_23:33:01.14278 {"timestamp":"2020-05-27T23:33:01.142Z","pid":9889,"message":"* Process workers: 3"}
2020-05-27_23:33:01.14279 {"timestamp":"2020-05-27T23:33:01.142Z","pid":9889,"message":"* Preloading application"}
2020-05-27_23:33:59.98441 {"timestamp":"2020-05-27T23:33:59.984Z","pid":9889,"message":"* Listening on unix:///var/opt/gitlab/gitlab-rails/sockets/gitlab.socket"}
2020-05-27_23:33:59.98470 {"timestamp":"2020-05-27T23:33:59.984Z","pid":9889,"message":"* Listening on tcp://127.0.0.1:8080"}
2020-05-27_23:33:59.98478 {"timestamp":"2020-05-27T23:33:59.984Z","pid":9889,"message":"! WARNING: Detected 3 Thread(s) started in app boot:"}
2020-05-27_23:33:59.98490 {"timestamp":"2020-05-27T23:33:59.984Z","pid":9889,"message":"! #\u003cThread:0x00007faf121fb260@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:38 sleep_forever\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/concurrent-ruby-1.1.6/lib/concurrent-ruby/concurrent/atomic/ruby_thread_local_var.rb:40:in `pop'"}
2020-05-27_23:33:59.98500 {"timestamp":"2020-05-27T23:33:59.984Z","pid":9889,"message":"! #\u003cThread:0x00007faf0c6330b8@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activerecord-6.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:334 sleep\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/activerecord-6.0.3/lib/active_record/connection_adapters/abstract/connection_pool.rb:337:in `sleep'"}
2020-05-27_23:33:59.98511 {"timestamp":"2020-05-27T23:33:59.985Z","pid":9889,"message":"! #\u003cThread:0x00007faef6681058@/opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/rack-timeout-0.5.1/lib/rack/timeout/support/scheduler.rb:73 sleep\u003e - /opt/gitlab/embedded/lib/ruby/gems/2.6.0/gems/rack-timeout-0.5.1/lib/rack/timeout/support/scheduler.rb:91:in `sleep'"}
2020-05-27_23:33:59.98528 {"timestamp":"2020-05-27T23:33:59.985Z","pid":9889,"message":"Use Ctrl-C to stop"}
==> /var/log/gitlab/puma/puma_stdout.log <==
{"timestamp":"2020-05-27T23:58:40.342Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2211.01953125 mb with master and 3 workers."}
{"timestamp":"2020-05-27T23:59:00.343Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2211.2734375 mb with master and 3 workers."}
{"timestamp":"2020-05-27T23:59:20.344Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2209.23046875 mb with master and 3 workers."}
{"timestamp":"2020-05-27T23:59:40.345Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2213.41015625 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:00:00.346Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2194.89453125 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:00:20.347Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2229.49609375 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:00:40.348Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2231.1796875 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:01:00.349Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2238.18359375 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:01:20.350Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2239.640625 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:01:40.350Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2249.60546875 mb with master and 3 workers."}
==> /var/log/gitlab/puma/puma_stderr.log <==
=== puma startup: 2020-05-28 01:33:59 +0200 ===
unknown OID 28: failed to recognize type of 'relfrozenxid'. It will be treated as String.
unknown OID 1034: failed to recognize type of 'relacl'. It will be treated as String.
unknown OID 194: failed to recognize type of 'relpartbound'. It will be treated as String.
unknown OID 28: failed to recognize type of 'relfrozenxid'. It will be treated as String.
unknown OID 1034: failed to recognize type of 'relacl'. It will be treated as String.
unknown OID 194: failed to recognize type of 'relpartbound'. It will be treated as String.
unknown OID 28: failed to recognize type of 'relfrozenxid'. It will be treated as String.
unknown OID 1034: failed to recognize type of 'relacl'. It will be treated as String.
unknown OID 194: failed to recognize type of 'relpartbound'. It will be treated as String.
==> /var/log/gitlab/puma/puma_stdout.log <==
{"timestamp":"2020-05-28T00:02:00.351Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2243.734375 mb with master and 3 workers."}
{"timestamp":"2020-05-28T00:02:20.352Z","pid":9889,"message":"PumaWorkerKiller: Consuming 2256.1328125 mb with master and 3 workers."}
Status gives the following (Unicorn is missing in the list, as it should be, but the process keeps still starting up):
gitlab-ctl status
run: gitaly: (pid 9769) 658s; run: log: (pid 3952) 2159s
run: gitlab-pages: (pid 9779) 658s; run: log: (pid 3942) 2159s
run: gitlab-workhorse: (pid 9809) 657s; run: log: (pid 3947) 2159s
run: logrotate: (pid 9822) 657s; run: log: (pid 3950) 2159s
run: mattermost: (pid 9839) 657s; run: log: (pid 3940) 2160s
run: postgresql: (pid 9853) 657s; run: log: (pid 3941) 2160s
run: puma: (pid 9889) 656s; run: log: (pid 8245) 1255s
run: redis: (pid 9906) 655s; run: log: (pid 3943) 2160s
run: sidekiq: (pid 9988) 652s; run: log: (pid 3944) 2160s
Output of checks
This bug happens with the on-prem GitLab (using Omnibus installation).
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Debian 10 Current User: git Using RVM: no Ruby Version: 2.6.6p146 Gem Version: 2.7.10 Bundler Version:1.17.3 Rake Version: 12.3.3 Redis Version: 5.0.9 Git Version: 2.26.2 Sidekiq Version:5.2.7 Go Version: unknown GitLab information Version: 13.0.1 Revision: 74623c80da9 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 11.7 URL: https://gitlab.melroy.org HTTP Clone URL: https://gitlab.melroy.org/some-group/some-project.git SSH Clone URL: git@gitlab.melroy.org:some-group/some-project.git Using LDAP: no Using Omniauth: yes Omniauth Providers: GitLab Shell Version: 13.2.0 Repository storage paths: - default: /media/data/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ...
Checking GitLab Shell ...
GitLab Shell: ... GitLab Shell version >= 13.2.0 ? ... OK (13.2.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Internal API available: OK Redis available via internal API: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Gitaly ...
Gitaly: ... default ... OK
Checking Gitaly ... Finished
Checking Sidekiq ...
Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Incoming Email ...
Incoming Email: ... Reply by email is disabled in config/gitlab.yml
Checking Incoming Email ... Finished
Checking LDAP ...
LDAP: ... LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab App ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 2/2 ... yes 2/3 ... yes 2/4 ... yes 2/5 ... yes 91/6 ... yes 2/7 ... yes 2/8 ... yes 2/10 ... yes 2/12 ... yes 2/13 ... yes 2/14 ... yes 2/15 ... yes 2/16 ... yes 2/19 ... yes 2/20 ... yes 2/21 ... yes 2/22 ... yes 2/23 ... yes 2/24 ... yes 2/25 ... yes 2/27 ... yes 2/30 ... yes 2/31 ... yes 2/32 ... yes 2/35 ... yes 2/36 ... yes 2/38 ... yes 2/39 ... yes 2/40 ... yes 2/41 ... yes 31/43 ... yes 31/44 ... yes 2/49 ... yes 2/50 ... yes 2/51 ... yes 2/52 ... yes 2/53 ... yes 2/54 ... yes 2/55 ... yes 2/57 ... yes 2/58 ... yes 2/59 ... yes 2/60 ... yes 2/61 ... yes 2/62 ... yes 2/63 ... yes 2/64 ... yes 2/65 ... yes 2/66 ... yes 2/67 ... yes 2/68 ... yes 2/69 ... yes 2/70 ... yes 2/71 ... yes 91/72 ... yes 2/73 ... yes 99/74 ... yes 100/75 ... yes 101/76 ... yes 2/77 ... yes 10/79 ... yes 2/80 ... yes Redis version >= 4.0.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.6) Git version >= 2.22.0 ? ... yes (2.26.2) Git user has default SSH configuration? ... yes Active users: ... 50 Is authorized keys file accessible? ... yes GitLab configured to store new projects in hashed storage? ... yes All projects are in hashed storage? ... no Try fixing it: Please migrate all projects to hashed storage as legacy storage is deprecated in 13.0 and support will be removed in 14.0. For more information see: doc/administration/repository_storage_types.md
Checking GitLab App ... Finished
Checking GitLab subtasks ... Finished
Possible fixes
N/A (the only workaround is killing the Unicorn processes manually for now)
Help!