Multiplied gitlab-healthcheck zombies takes all resources
Summary
Our Gitlab server is running in a docker container beside other containers on a DELL server. Sometimes a strange entry in syslog predicts usualy at first the unavailability of SSH service or rather a fast loose of resources:
Jun 19 18:45:00 my_servername dockerd[1851]: time="2018-06-19T18:45:00.255132753Z" level=warning msg="Health check for container 3b87da30c0fed392367d6e2b58c0007735e8a9d4ac3ed9aa9ebc5415512c5589 error: context cancelled"
After that, unable to log on host machine, except via iDRAC console. Once logged via iDRAC, the hundreds of gitlab-healthceck --fail
processes still opened.
Steps to reproduce
Randomly reproducible. It comes sometimes after 1 week, sometimes after few days
- Run Gitlab in docker container in dedicated bridge network
~$ docker network ls
NETWORK ID NAME DRIVER SCOPE
d092459267c1 bridge bridge local
558219b0bfb2 host host local
49a472a8e79b none null local
6d3cc248e2b8 my_network bridge local
~$ docker ps
3b87da30c0fe gitlab/gitlab-ce:latest "/assets/wrapper" 2 weeks ago Up 44 minutes (healthy) 878f0a55a3c7 gitlab/gitlab-runner:latest "/usr/bin/dumb-init …" 7 weeks ago Up 44 minutes gitlab-runner
...
...
What is the current bug behavior?
The machine resources are quickly taken by lot of healthcheck process and first side effect is unavailability of SSH service on host machine, only iDRAC console can take control.
What is the expected correct behavior?
No healthcheck zoombies.
Relevant logs and/or screenshots
~$ ps auxf | grep healthcheck
...
...
root 23591 0.0 0.0 0 0 ? Zl Jun19 0:00 | | | \_ [curl] <defunct>
root 23790 0.0 0.0 4504 736 ? Ss Jun19 0:00 | | \_ /bin/sh -c /opt/gitlab/bin/gitlab-healthcheck --fail
root 23794 0.0 0.0 0 0 ? Zl Jun19 0:00 | | | \_ [curl] <defunct>
root 24046 0.0 0.0 4504 840 ? Ss Jun19 0:00 | | \_ /bin/sh -c /opt/gitlab/bin/gitlab-healthcheck --fail
root 24052 0.0 0.0 0 0 ? Zl Jun19 0:00 | | | \_ [curl] <defunct>
root 24252 0.0 0.0 4504 744 ? Ss Jun19 0:00 | | \_ /bin/sh -c /opt/gitlab/bin/gitlab-healthcheck --fail
root 24256 0.0 0.0 0 0 ? Zl Jun19 0:00 | | | \_ [curl] <defunct>
root 24455 0.0 0.0 4504 704 ? Ss Jun19 0:00 | | \_ /bin/sh -c /opt/gitlab/bin/gitlab-healthcheck --fail
root 24459 0.0 0.0 0 0 ? Zl Jun19 0:00 | | | \_ [curl] <defunct>
--fail
...
...
~$ ps auxf | grep healthcheck | wc -l
710
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Current User: git Using RVM: no Ruby Version: 2.3.7p456 Gem Version: 2.6.14 Bundler Version:1.13.7 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.16.4 Sidekiq Version:5.0.5 Go Version: unknownGitLab information Version: 10.8.3 Revision: 564c342 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql URL: http://my_url/gitlab HTTP Clone URL: http://my_url/gitlab/some-group/some-project.git SSH Clone URL: git@my_url:some-group/some-project.git Using LDAP: yes Using Omniauth: no
GitLab Shell Version: 7.1.2 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab Shell ...GitLab Shell version >= 7.1.2 ? ... OK (7.1.2) Repo base directory exists? default... yes Repo storage directories are symlinks? default... no Repo paths owned by git:root, or git:git? default... yes Repo paths access is drwxrws---? default... yes hooks directories in repos are links: ... 3/1 ... ok 3/2 ... ok 3/3 ... ok 3/5 ... ok 3/6 ... ok 8/7 ... ok 8/8 ... ok 8/9 ... ok 8/10 ... ok 8/11 ... ok 3/12 ... ok 3/13 ... ok 3/14 ... ok 8/15 ... ok Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Reply by email is disabled in config/gitlab.yml Checking LDAP ...
******** secret LDAP info ******* Checking LDAP ... Finished
Checking GitLab ...
Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... 3/1 ... yes 3/2 ... yes 3/3 ... yes 3/5 ... yes 3/6 ... yes 8/7 ... yes 8/8 ... yes 8/9 ... yes 8/10 ... yes 8/11 ... yes 3/12 ... yes 3/13 ... yes 3/14 ... yes 8/15 ... yes Redis version >= 2.8.0? ... yes Ruby version >= 2.3.5 ? ... yes (2.3.7) Git version >= 2.9.5 ? ... yes (2.16.4) Git user has default SSH configuration? ... yes Active users: ... 10