Notable increase in CPU load generated by ruby file hooks since GitLab Omnibus v14.9.0
Summary
Ruby based File Hooks appear to generate noticeably higher levels of CPU load since GitLab Omnibus v14.9.0. This appears to still be the case in the current GitLab Omnibus release (v15.0.1 at the time of writing).
Update: Immediate workarounds given the discovery of the root cause per:
-
Where feasible, use the ruby file hook with
--disable-all
set on the shebang:#!/opt/gitlab/embedded/bin/ruby --disable-all
-
Alternatively - Update
bundler
tov2.3.15
:sudo /opt/gitlab/embedded/bin/gem install bundler -v2.3.15 sudo /opt/gitlab/embedded/bin/bundler -v sudo gitlab-ctl restart sidekiq
Steps to reproduce
Test environment details
-
AWS EC2 Instance type:
m5zn.2xlarge
- vCPU: 8
- RAM: 8GB
- OS:
Amazon Linux 2 AMI 2.0.20220426.0 x86_64 HVM gp2
- AMI ID:
ami-06eecef118bbf9259
- AMI ID:
-
3 fresh EC2 instances with individual GitLab Omnibus installations with a default
gitlab.rb
were used to comparev14.8.6
,v14.9.0
andv15.0.1
in the manner outlined below.
-
Create an executable ruby file hook with the following content at
/opt/gitlab/embedded/service/gitlab-rails/file_hooks/test.rb
#!/opt/gitlab/embedded/bin/ruby x = $stdin.read File.write('/tmp/rb-data.txt', x)
For the purpose of replicating the issue, it doesn't seem to matter what instructions are performed inside the hook itself.
-
Start generating perf + flamegraph output via
perf_flamegraph_for_all_running_processes.sh
, which will run a perf capture for a duration of 60 seconds across all running processes. -
Immediately after triggering the perf capture in the previous step, start to seed data to GitLab with per this script/guide. This will rapidly and repeatedly trigger events that will cause the file hook to be executed.
A file hook runs on each event. You can filter events or projects in a file hook’s code, and create many file hooks as you need. Each file hook is triggered by GitLab asynchronously in case of an event. For a list of events see the system hooks documentation.
-
A noticeable increase in CPU load is observed when using GitLab Omnibus
v14.9.0
. This is the earliest version where this increase is observed, and it appears to be replicated in releases afterv14.9.0
.
-
10 second CPU averages within AWS console for
v14.8.6
vsv14.9.0
vsv15.0.1
-
10 second CPU maximums within AWS console for
v14.8.6
vsv14.9.0
vsv15.0.1
Additional observations
-
Using the official Ubuntu based
gitlab-ee
AMIs was also tested, no differences were observed - the issue is present. -
The issue seems to be specific to ruby based file hooks. If the same outlined situation is tested using a bash script for the file hook, the issue is not present. For example with
/opt/gitlab/embedded/service/gitlab-rails/file_hooks/test.sh
#!/bin/bash cat /dev/stdin > /tmp/rb-data.txt
-
I also tried testing a ruby file hook via RVM based ruby 2.7.5 per the following steps with the intention of running a test that would be isolated from GitLab's embedded ruby environment, however I didn't have any luck with this. The same performance issue was observed.
-
Setup RVM to use ruby 2.7.5 to match the ruby version in use by GitLab.
gpg --keyserver hkp://pgp.mit.edu --recv-keys 409B6B1796C275462A1703113804BB82D39DC0E3 7D2BAF1CF37B13E2069D6956105BD0E739499BDB \curl -sSL https://get.rvm.io | bash -s stable --ruby rvm install 2.7.5 rvm use 2.7.5 which ruby ruby -v
-
Setup the file hook as follows in
/opt/gitlab/embedded/service/gitlab-rails/file_hooks/test.rb
#!/usr/local/rvm/rubies/ruby-2.7.5/bin/ruby f = File.open('/tmp/ruby_load_path.txt', 'w') f.puts $LOAD_PATH f.close
The
$LOAD_PATH
output was still showing/opt/gitlab/embedded*
paths even when confirming the RVM based ruby binary was being used to execute the file hook, so perhaps these are inherited somewhere along the way upon execution. -
What is the current bug behavior?
Ruby based file hooks appear to generate a noticeably increased amount of CPU load as of GitLab Omnibus v14.9.0
.
In this particular example test environment, it appears that there is an average CPU load increase from ~10 up to ~26, with CPU max spikes showing an increase from ~14 up to ~40.
What is the expected correct behavior?
I am not certain at this point in time if the observed performance differences are expected due to changes introduced into GitLab from v14.9.0
or not.
If these differences are not expected, then the expected behavior would be for the generated CPU load to be closer to those observed values seen in v14.8.6
.
In either case, it would be ideal if we can discover exactly what the cause is for this performance difference, and it can be documented as an item to be aware of when upgrading to v14.9.0
and beyond.