DB restart fails while migrating from 9.1.4-ee to 10.5.1-ee with group != user (fix in runit_service.rb)
Summary
Because of various constraints, we had to configure our gitlab.rb as follows:
user['username'] = "some_USER_other_than_gitlab"
user['group'] = "some_GROUP_other_than_gitlab"
with some_USER_other_than_gitlab != some_GROUP_other_than_gitlab
While migrating from 9.1.4-ee to 10.5.1-ee, we ran into the following error:
* cannot determine group id for 'some_USER_other_than_gitlab', does the group exist on this system?
================================================================================
Error executing action `touch` on resource 'file[/opt/gitlab/sv/postgresql/supervise/ok]'
================================================================================
Chef::Exceptions::GroupIDNotFound
---------------------------------
cannot determine group id for 'some_USER_other_than_gitlab', does the group exist on this system?
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/runit/definitions/runit_service.rb
200: file "#{sv_dir_name}/supervise/#{fl}" do
201: owner supervisor_owner
202: group supervisor_group
203: not_if { params[:supervisor_owner].nil? || params[:supervisor_group].nil? }
204: only_if { !omnibus_helper.expected_owner?(name, supervisor_owner, supervisor_group) }
205: action :touch
206: end
207:
And further down:
There was an error running gitlab-ctl reconfigure:
file[/opt/gitlab/sv/postgresql/supervise/ok] (gitlab::postgresql line 200) had an error: Chef::Exceptions::GroupIDNotFound: cannot determine group id for 'some_USER_other_than_gitlab', does the group exist on this system?
A bit of research made us understand that this problem comes from a fix that allows to restart the DB with DB user and not root.
In other words, in file /blablabla/gitlab/embedded/cookbooks/runit/definitions/runit_service.rb, in the snippet below, params[:supervisor_group] returns some_USER_other_than_gitlab (which doesn't exist as a group) instead of some_GROUP_other_than_gitlab.
supervisor_owner = params[:supervisor_owner] || 'root'
supervisor_group = params[:supervisor_group] || 'root'
%w(ok control).each do |fl|
file "#{sv_dir_name}/supervise/#{fl}" do
owner supervisor_owner
group supervisor_group
not_if { params[:supervisor_owner].nil? || params[:supervisor_group].nil? }
only_if { !omnibus_helper.expected_owner?(name, supervisor_owner, supervisor_group) }
action :touch
end
We worked arround this by replacing the above code by
supervisor_owner = params[:supervisor_owner] || 'root'
supervisor_group = 'some_GROUP_other_than_gitlab'
%w(ok control).each do |fl|
file "#{sv_dir_name}/supervise/#{fl}" do
owner supervisor_owner
group supervisor_group
not_if { params[:supervisor_owner].nil? || params[:supervisor_group].nil? }
only_if { !omnibus_helper.expected_owner?(name, supervisor_owner, supervisor_group) }
action :touch
end
Steps to reproduce
Have a group different that the user and run runit_service.rb. Of course, the machine on which you run the steps should not have a group named after the user!
What is the current bug behavior?
The script should not be able to touch file /opt/gitlab/sv/postgresql/supervise/ok because group does not exist.
What is the expected correct behavior?
Script runit_service.rb should run fine provided that params[:supervisor_group] returns the value of the group, instead of the value of the user.
Relevant logs
See description
Details of package version
See description
Environment details
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Red Hat Enterprise Linux Server release 6.7 (Santiago)
Not sure, this is relevant.
Configuration details
See description