Moving repository between storages never completes if many tags/branches
Summary
Starting with 10.8, it is not possible to move a repository between storages anymore if the repository has many tags/branches.
This was working without issue until 10.7 included.
Steps to reproduce
- create a new project with an initial commit
- create a large number of tags for the initial commit (e.g.
for i in {1..2000}; do git tag "mycustomtag/$i"; done
) git push --tags
- as admin, using the Projects API, set the project's
repository_storage
to a different storage shard (as per !533 (merged))
Example Project
https://gitlab.com/cernvcs/repo-many-tags
What is the current bug behavior?
The project move never completes. A few hundred tags are copied to the new storage, then the import stalls. The sidekiq background job in charge of the move operation keeps running forever.
What is the expected correct behavior?
The project is moved to the new storage shard.
Relevant logs and/or screenshots
A git-fetch
process is spawned by SideKiq to import the repository into the target storage.
This process stalls after a while. Running strace
on the stalled git-fetch
process the issue is clear:
# strace -p 25231
strace: Process 25231 attached
write(2, " * [new tag] night"..., 88
The process is blocked writing to STDERR.
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
# gitlab-rake gitlab:env:infoSystem information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.3.7p456 Gem Version: 2.6.14 Bundler Version:1.13.7 Rake Version: 12.3.1 Redis Version: 3.2.11 Git Version: 2.16.4 Sidekiq Version:5.0.5 Go Version: unknown
GitLab information Version: 10.8.5-ee Revision: 8f03e3e Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: postgresql DB Version: 9.6.2 URL: https://XXXX HTTP Clone URL: https://XXXXX/some-group/some-project.git SSH Clone URL: ssh://git@XXXXX:7999/some-group/some-project.git Elasticsearch: yes Geo: no Using LDAP: yes Using Omniauth: yes Omniauth Providers: saml, kerberos_spnego
GitLab Shell Version: 7.1.2 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories
- secondary: /var/opt/gitlab/git-data2/repositories
- third: /var/opt/gitlab/git-data3/repositories Hooks: /opt/gitlab/embedded/service/gitlab-shell/hooks Git: /opt/gitlab/embedded/bin/git
Possible fixes
https://gitlab.com/gitlab-org/gitlab-ee/blob/master/lib/gitlab/git/repository.rb#L1315
does not consume the git process's STDERR. When git-fetch
generates a significant amount of output
(such as when fetching lots of tags/branches), this is apparently causing the git process to block
waiting for the caller to consume the git process's STDERR buffer.