Possible Race Condition in Migration
Summary
When running the omnibus upgrade through the apt repo (i.e. apt-get upgrade
), my gitlab instance would get hung on the gitlab::database_migrations recipe. After 1 hour, the command would time out with no STDOUT and no STDERR data. I could see an idle transaction using ps aux
and also using psql. The transaction was hung after BEGIN
.
This was an upgrade from v8.15.0
to v8.15.4
.
Steps to reproduce
This was reproducible on my gitlab instance until I made a change to one of the post_migration files to allow the upgrade to complete. Due to the nature of race conditions, it may not be reproducible in many other configurations.
Expected behavior
The apt-get upgrade
should have completed successfully.
Actual behavior
apt-get upgrade
hangs for 1 hour on the gitlab::database_migrations recipe, then returns an error.
Relevant logs and/or screenshots
During the 1 hour, waiting for the recipe to time out, you will be able to see this listed in the system's processes:
$ ps aux
...
gitlab-+ 94654 0.0 0.5 612932 10576 ? Ss 2016 0:00 postgres: gitlab gitlabhq_production [local] idle in transaction
...
The resulting failure from the recipe (notice the empty STDOUT and STDERR).
Recipe: gitlab::database_migrations
* bash[migrate gitlab-rails database] action run
- Would execute "bash" "/tmp/chef-script20170114-94066-1nzuk4x"
Recipe: gitlab::postgresql
* execute[enable pg_trgm extension] action run
[execute] CREATE EXTENSION
NOTICE: extension "pg_trgm" already exists, skipping
- execute /opt/gitlab/bin/gitlab-psql -d gitlabhq_production -c "CREATE EXTENSION IF NOT EXISTS pg_trgm;"
Recipe: gitlab::database_migrations
* bash[migrate gitlab-rails database] action run
================================================================================
Error executing action `run` on resource 'bash[migrate gitlab-rails database]'
================================================================================
Mixlib::ShellOut::CommandTimeout
--------------------------------
Command timed out after 3600s:
Command exceeded allowed execution time, process terminated
---- Begin output of "bash" "/tmp/chef-script20170114-94066-nmiara" ----
STDOUT:
STDERR:
---- End output of "bash" "/tmp/chef-script20170114-94066-nmiara" ----
Ran "bash" "/tmp/chef-script20170114-94066-nmiara" returned
Resource Declaration:
---------------------
# In /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb
44: bash "migrate gitlab-rails database" do
45: code <<-EOH
46: set -e
47: log_file="#{node['gitlab']['gitlab-rails']['log_directory']}/gitlab-rails-db-migrate-$(date +%Y-%m-%d-%H-%M-%S).log"
48: umask 077
49: /opt/gitlab/bin/gitlab-rake gitlab:db:configure 2>& 1 | tee ${log_file}
50: STATUS=${PIPESTATUS[0]}
51: echo $STATUS > #{db_migrate_status_file}
52: exit $STATUS
53: EOH
54: environment ({'GITLAB_ROOT_PASSWORD' => initial_root_password }) if initial_root_password
55: notifies :run, 'execute[enable pg_trgm extension]', :before unless OmnibusHelper.not_listening?("postgresql") || !node['gitlab']['postgresql']['enable']
56: notifies :run, "execute[clear the gitlab-rails cache]", :immediately unless OmnibusHelper.not_listening?("redis")
57: dependent_services.each do |svc|
58: notifies :restart, svc, :immediately
59: end
60: not_if "(test -f #{db_migrate_status_file}) && (cat #{db_migrate_status_file} | grep -Fx 0)"
61: only_if { node['gitlab']['gitlab-rails']['auto_migrate'] }
62: end
Compiled Resource:
------------------
# Declared in /opt/gitlab/embedded/cookbooks/cache/cookbooks/gitlab/recipes/database_migrations.rb:44:in `from_file'
bash("migrate gitlab-rails database") do
action [:run]
updated true
retries 0
retry_delay 2
default_guard_interpreter :default
command "migrate gitlab-rails database"
backup 5
returns 0
code " set -e\n log_file=\"/var/log/gitlab/gitlab-rails/gitlab-rails-db-migrate-$(date +%Y-%m-%d-%H-%M-%S).log\"\n umask 077\n /opt/gitlab/bin/gitlab-rake gitlab:db:configure 2>& 1 | tee ${log_file}\n STATUS=${PIPESTATUS[0]}\n echo $STATUS > /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-a0b1379\n exit $STATUS\n"
interpreter "bash"
declared_type :bash
cookbook_name "gitlab"
recipe_name "database_migrations"
not_if "(test -f /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-a0b1379) && (cat /var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-873248b1f0d3a7a5535771a3a1635803-a0b1379 | grep -Fx 0)"
only_if { #code block }
end
Platform:
---------
x86_64-linux
Output of checks
Results of GitLab application Check
root@code:~# gitlab-rake gitlab:check SANITIZE=true
Checking GitLab Shell ...
GitLab Shell version >= 4.1.1 ? ... OK (4.1.1)
Repo base directory exists?
default... yes
Repo storage directories are symlinks?
default... no
Repo paths owned by git:git?
default... yes
Repo paths access is drwxrws---?
default... yes
hooks directories in repos are links: ...
3/2 ... ok
4/3 ... ok
4/4 ... ok
4/5 ... ok
2/6 ... ok
4/7 ... ok
4/8 ... ok
4/9 ... ok
4/11 ... ok
3/12 ... ok
3/13 ... ok
9/14 ... ok
4/15 ... ok
9/16 ... ok
10/17 ... ok
10/18 ... ok
10/19 ... ok
10/20 ... ok
4/21 ... ok
10/22 ... ok
10/23 ... ok
11/24 ... ok
11/25 ... ok
11/26 ... ok
11/27 ... ok
11/28 ... ok
11/29 ... ok
11/30 ... ok
11/31 ... ok
2/32 ... ok
4/33 ... ok
12/34 ... ok
12/35 ... ok
11/36 ... ok
4/37 ... ok
3/38 ... ok
3/39 ... ok
4/40 ... ok
3/41 ... ok
4/42 ... ok
14/43 ... ok
4/45 ... ok
4/46 ... ok
5/47 ... ok
15/48 ... repository is empty
16/49 ... ok
4/50 ... ok
17/51 ... ok
17/52 ... ok
17/53 ... ok
17/54 ... ok
17/55 ... ok
4/56 ... ok
4/57 ... ok
18/58 ... ok
4/59 ... ok
18/60 ... ok
18/61 ... ok
2/62 ... ok
Running /opt/gitlab/embedded/service/gitlab-shell/bin/check
Check GitLab API access: OK
Access to /var/opt/gitlab/.ssh/authorized_keys: OK
Send ping to redis server: OK
gitlab-shell self-check successful
Checking GitLab Shell ... Finished
Checking Sidekiq ...
Running? ... yes
Number of Sidekiq processes ... 1
Checking Sidekiq ... Finished
Checking Reply by email ...
Reply by email is disabled in config/gitlab.yml
Checking Reply by email ... Finished
Checking LDAP ...
LDAP is disabled in config/gitlab.yml
Checking LDAP ... Finished
Checking GitLab ...
Git configured with autocrlf=input? ... yes
Database config exists? ... yes
All migrations up? ... yes
Database contains orphaned GroupMembers? ... no
GitLab config exists? ... yes
GitLab config outdated? ... no
Log directory writable? ... yes
Tmp directory writable? ... yes
Uploads directory setup correctly? ... yes
Init script exists? ... skipped (omnibus-gitlab has no init script)
Init script up-to-date? ... skipped (omnibus-gitlab has no init script)
projects have namespace: ...
3/2 ... yes
4/3 ... yes
4/4 ... yes
4/5 ... yes
2/6 ... yes
4/7 ... yes
4/8 ... yes
4/9 ... yes
4/11 ... yes
3/12 ... yes
3/13 ... yes
9/14 ... yes
4/15 ... yes
9/16 ... yes
10/17 ... yes
10/18 ... yes
10/19 ... yes
10/20 ... yes
4/21 ... yes
10/22 ... yes
10/23 ... yes
11/24 ... yes
11/25 ... yes
11/26 ... yes
11/27 ... yes
11/28 ... yes
11/29 ... yes
11/30 ... yes
11/31 ... yes
2/32 ... yes
4/33 ... yes
12/34 ... yes
12/35 ... yes
11/36 ... yes
4/37 ... yes
3/38 ... yes
3/39 ... yes
4/40 ... yes
3/41 ... yes
4/42 ... yes
14/43 ... yes
4/45 ... yes
4/46 ... yes
5/47 ... yes
15/48 ... yes
16/49 ... yes
4/50 ... yes
17/51 ... yes
17/52 ... yes
17/53 ... yes
17/54 ... yes
17/55 ... yes
4/56 ... yes
4/57 ... yes
18/58 ... yes
4/59 ... yes
18/60 ... yes
18/61 ... yes
2/62 ... yes
Redis version >= 2.8.0? ... yes
Ruby version >= 2.1.0 ? ... yes (2.3.3)
Your git bin path is "/opt/gitlab/embedded/bin/git"
Git version >= 2.7.3 ? ... yes (2.8.4)
Active users: 13
Checking GitLab ... Finished
Results of GitLab environment info
root@code:~# gitlab-rake gitlab:env:info
System information
System: Ubuntu 14.04
Current User: git
Using RVM: no
Ruby Version: 2.3.3p222
Gem Version: 2.6.6
Bundler Version:1.13.7
Rake Version: 10.5.0
Sidekiq Version:4.2.7
GitLab information
Version: 8.15.4
Revision: a0b1379
Directory: /opt/gitlab/embedded/service/gitlab-rails
DB Adapter: postgresql
URL: https://code.8bitduck.net
HTTP Clone URL: https://code.8bitduck.net/some-group/some-project.git
SSH Clone URL: git@code.8bitduck.net:some-group/some-project.git
Using LDAP: no
Using Omniauth: no
GitLab Shell
Version: 4.1.1
Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks/
Git: /opt/gitlab/embedded/bin/git
Possible fixes
I was able to fix the problem by setting the number of threads in the post_migration to 1:
File: db/post_migrate/20161221153951_rename_reserved_project_names.rb
...
THREAD_COUNT = 1 # Was 8
...
Granted, this isn't actually a fix, but a workaround. The fix would be to find and resolve the deadlock.
Some combination of database transactions (probably built in to rails or whatever migration library being used) and having multiple threads were somehow blocking any further database commands, which resulted in an postgresql idle transaction.