Clean up old-style temporary repositories

Checking the refdb of gitlab-com/www-gitlab-com for https://gitlab.com/gitlab-org/gitlab-ee/issues/5721#note_69261787 , I noticed that it has a large number of old-style temporary directories still in place:

root@web-02-sv-gprd.c.gitlab-production.internal:/var/opt/gitlab/git-data-file03/repositories/gitlab-com# du -hs www-gitlab-com*
56K	www-gitlab-com_1eed649c986ce2.git
203M	www-gitlab-com_2e5928b2cd2cfd.git
319M	www-gitlab-com_41c52b9b96078c.git
1.8G	www-gitlab-com_43c3d48e502c0d.git
1.7G	www-gitlab-com_4eb93a5272496a.git
1.4G	www-gitlab-com_618f855d3801ea.git
56K	www-gitlab-com_77043c84cd330f.git
1.7G	www-gitlab-com_8af8612d0fb137.git
317M	www-gitlab-com_8b5c0dae86441f.git
1.4G	www-gitlab-com_93a9212b4e7215.git
860K	www-gitlab-com_cec569ffa4f869.git
318M	www-gitlab-com_d3ae7f14de0096.git
1.9G	www-gitlab-com.git

The code that generates these has been updated so we instead only have a single temporary directory at any one time, in @geo-temporary/*, so I think we need to conduct a sweep-and-remove of these directories.

In the www-gitlab-com case, the temporary directories that are left over mean we use approx. 6x the space necessary.

I propose we scan for all directories matching %r{/*_[[:xdigit:]]{14}.(?\.wiki).git\z} (or so) and then verify that the returned paths do not match a project before removing.

Alternatively, we could unconditionally remove them and rely on repository verification to force a resync of any projects unlucky enough to be called foo_abcabcabcabca, etc.

Edited Apr 19, 2018 by Nick Thomas
Assignee Loading
Time tracking Loading