high cpu usage of gitlab-monitor / puma

Summary

the gitlab-monitor process is using excessive cpu compared to the rest of the instance:

e.g. after 2 days uptime, node the total cpu usage column which is more than 10 times larger than other components

 9131 git       20   0  729652  66920   8260 S  36.1  0.1   1804:33 puma 3.12.0 (tcp://....:9101) [gitlab-monitor]                                                                                       
29546 git       20   0 8578728 1.031g  19672 S   1.7  1.6 143:56.34 sidekiq 5.2.5 gitlab-rails [0 of 25 busy]                                                                                                    
29307 git       20   0 3755864  85532  10548 S   1.3  0.1  94:52.02 /opt/gitlab/embedded/bin/gitaly /var/opt/gitlab/gitaly/config.toml                                                                           
29455 git       20   0 2559536  47680   8244 S   0.7  0.1  73:08.84 /opt/gitlab/embedded/bin/gitlab-workhorse -listenNetwork unix -listen

It is a gitlab-ee (starter) instance 11.10.2 using the omnibus package, about 190 users running on debian stretch and linux 4.19 Gitlab monitor configuration in the gitlab.rb:

################################################################################
## Gitlab monitor
##! Docs: https://docs.gitlab.com/ce/administration/monitoring/performance/prometheus.html

gitlab_monitor['enable'] = true
gitlab_monitor['listen_address'] = "..."
gitlab_monitor['listen_port'] = 9101

perf profile indicates memory allocation/garbage collection:

  10.90%  libruby.so.2.5.3    [.] match_at
   6.15%  libruby.so.2.5.3    [.] gc_sweep_step
   5.85%  libruby.so.2.5.3    [.] vm_exec_core
   3.48%  [kernel]            [k] smaps_account
   2.19%  libruby.so.2.5.3    [.] rb_enc_get_index
   2.12%  libruby.so.2.5.3    [.] vm_call_cfunc
   1.60%  libruby.so.2.5.3    [.] objspace_malloc_increase.isra.74
   1.58%  libruby.so.2.5.3    [.] rb_enc_from_index
   1.57%  libc-2.24.so        [.] malloc_usable_size

There are some errors in the monitoring log, but its frequency does not match with cpu spikes (which is 100% cpu every 1-3 seconds)

2019-05-02_12:29:21.60266 Error: No such process @ io_fillbuf - fd:22 /proc/25305/smaps
2019-05-02_12:32:19.38915 Error: No such file or directory @ rb_sysopen - /proc/25347/smaps
2019-05-02_12:32:19.47903 Error: No such file or directory @ rb_sysopen - /proc/25347/smaps
2019-05-02_12:32:19.99962 Error: No such file or directory @ rb_sysopen - /proc/25347/smaps
2019-05-02_12:39:06.39761 Error: No such file or directory @ rb_sysopen - /proc/8902/smaps
2019-05-02_12:54:05.89278 Error: No such file or directory @ rb_sysopen - /proc/19388/smaps
2019-05-02_12:54:05.89668 Error: No such file or directory @ rb_sysopen - /proc/19390/smaps
2019-05-02_12:57:35.63579 Error: No such file or directory @ rb_sysopen - /proc/30256/smaps
2019-05-02_13:08:36.45827 Error: No such file or directory @ rb_sysopen - /proc/11526/smaps
2019-05-02_13:15:36.24081 Error: No such file or directory @ rb_sysopen - /proc/2063/smaps
2019-05-02_13:28:36.43884 Error: No such file or directory @ rb_sysopen - /proc/4780/smaps
2019-05-02_13:39:06.57351 Error: No such process @ io_fillbuf - fd:25 /proc/20357/smaps
2019-05-02_13:50:51.37607 Error: No such file or directory @ rb_sysopen - /proc/13576/smaps
2019-05-02_13:52:51.39040 Error: No such process @ io_fillbuf - fd:15 /proc/24202/smaps
2019-05-02_14:11:06.04343 Error: No such file or directory @ rb_sysopen - /proc/19891/smaps
2019-05-02_14:29:05.34702 Error: No such file or directory @ rb_sysopen - /proc/28914/smaps
2019-05-02_14:33:50.76582 Error: No such file or directory @ rb_sysopen - /proc/23277/smaps
2019-05-02_14:40:32.45084 Error: No such file or directory @ rb_sysopen - /proc/12630/smaps
2019-05-02_14:40:32.58701 Error: No such file or directory @ rb_sysopen - /proc/12630/smaps

I think the issue is new since we updated from 11.8 to 11.10, but I am not confident about that.

Edited by Julian Taylor