Skip to content

Moving repository between storages never completes if many tags/branches

Summary

Starting with 10.8, it is not possible to move a repository between storages anymore if the repository has many tags/branches.

This was working without issue until 10.7 included.

Steps to reproduce

  1. create a new project with an initial commit
  2. create a large number of tags for the initial commit (e.g. for i in {1..2000}; do git tag "mycustomtag/$i"; done)
  3. git push --tags
  4. as admin, using the Projects API, set the project's repository_storage to a different storage shard (as per !533 (merged))

Example Project

https://gitlab.com/cernvcs/repo-many-tags

What is the current bug behavior?

The project move never completes. A few hundred tags are copied to the new storage, then the import stalls. The sidekiq background job in charge of the move operation keeps running forever.

What is the expected correct behavior?

The project is moved to the new storage shard.

Relevant logs and/or screenshots

A git-fetch process is spawned by SideKiq to import the repository into the target storage. This process stalls after a while. Running strace on the stalled git-fetch process the issue is clear:

# strace -p 25231
strace: Process 25231 attached
write(2, " * [new tag]               night"..., 88

The process is blocked writing to STDERR.

Output of checks

Results of GitLab environment info

Expand for output related to GitLab environment info
# gitlab-rake gitlab:env:info

System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.7p456 Gem Version: 2.6.14 Bundler Version:1.13.7 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.16.4 Sidekiq Version:5.0.5 Go Version: unknown

GitLab information Version: 10.8.5-ee Revision: 8f03e3e Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.2 URL: https://XXXX HTTP Clone URL: https://XXXXX/some-group/some-project.git SSH Clone URL: ssh://git@XXXXX:7999/some-group/some-project.git Elasticsearch: yes Geo: no Using LDAP: yes Using Omniauth: yes Omniauth Providers: saml, kerberos_spnego

GitLab Shell Version: 7.1.2 Repository storage paths:

  • default: /var/opt/gitlab/git-data/repositories
  • secondary: /var/opt/gitlab/git-data2/repositories
  • third: /var/opt/gitlab/git-data3/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git

Possible fixes

https://gitlab.com/gitlab-org/gitlab-ee/blob/master/lib/gitlab/git/repository.rb#L1315 does not consume the git process's STDERR. When git-fetch generates a significant amount of output (such as when fetching lots of tags/branches), this is apparently causing the git process to block waiting for the caller to consume the git process's STDERR buffer.

Edited by Alex Lossent