From gitlab-com/infrastructure#2904, we appear to have over 8,000 temporary references in the GitLab CE repository that probably don't need to be there anymore. This seems to be slowing pushes by 20+ seconds.
As part of our garbage collection process, we should probably remove these if they are older than some time (e.g. a week) because they can significantly affect performance. These refs aren't folded into the packed-refs file, so each of them will cause an extra stat and open system call unnecessarily.
Note that these refs aren't even valid refs; they are empty directories, so there's even more reason to delete them.
To be clear, these refs don't exist - at least in the cases I've seen - but their directories still exist. My guess is because of the directory structure we use: I'm not sure if git will delete refs/foo when deleting refs/foo/bar, even if refs/foo is now empty.
Maybe we should switch to using refs/tmp/$sha, and leave off the trailing /head?
@stanhu FYI we discovered the exact same thing that last Friday, and I think housekeeping should indeed take care of that.
I think https://gitlab.com/gitlab-org/gitlab-ce/issues/38498 is more generic since it's about removing empty directories under refs/ when housekeeping? We don't even need to check their mtime or filter only refs/tmp / refs/merge-requests, we can just delete all empty directories in refs/**/*.
@DouweM I think there are probably 3 related topic here:
Empty directories in general as described in #38498 (moved)
Empty directories in refs/tmp
Unused refs in refs/tmp (Do we have this issue?)
And:
For empty directories in general, we need to consider about race conditions so it could be harder to do it right.
For empty directories in refs/tmp I think we could safely just remove them, as they shouldn't be used at the same time, and we should do what @smcgivern suggested in https://gitlab.com/gitlab-org/gitlab-ce/issues/38689#note_42242988 so that we just use refs/tmp/HEX and don't create potential empty directories. As for when we should do this cleanup, maybe also in housekeeping.
For unused refs in refs/tmp, we should probably clean them up as @stanhu suggested that we remove refs which are old enough, assuming that they're just left as garbage. We could also do this when housekeeping, assuming that people are NOT using refs/tmp in their workflow.
we can fix by using refs/tmp/HEX instead of refs/tmp/HEX/head, and the existing empty dirs will be cleaned up as part of 1.
we don't know is actually a problem. We can run git for-each-ref 'refs/tmp' to see how many tmp refs we currently have alive. I'd expect a handful for currently running background tasks like merges, but not too many.
We ran into a serious performance issue with a customer. Using strace we found that loading /dashboard/projects the first time was calling stat on 18,000 files, most of which were in refs/tmp/. Cleaning up those files mostly resolved the performance issue.
Is it safe to clean up these temp refs? I guess the only concern would be any in-flight merges or rebases may fail?
Is it safe to clean up these temp refs? I guess the only concern would be any in-flight merges or rebases may fail?
Are the sizes of these refs empty? You can delete them if they are. You can probably delete them too if they are older than an hour.
@DouweM It looks like we create a temp ref and delete it every time we do a branch comparison. I can see needing to do that for forks, but is it really necessary when the comparison happens within the same project?
is it really necessary when the comparison happens within the same project?
@stanhu It definitely isn't. This looks like a performance regression introduced by https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/19700. Before, compare_source_branch used with_repo_branch_commit which would only create a temp ref when the repositories are different and the commit is not already in the current repository, but the new implementation lacks that logic and will always create a temp ref.
I'm not sure what problem we're talking about now because this issue is a year old and I see multiple problems coming by.
Empty files under refs/ are by definition broken and should be deleted. We recently added cleanup code for such empty files (gitaly!992 (merged)) but I'm not sure if that covers refs/tmp. cc @reprazent
GitLab is moving all development for both GitLab Community Edition
and Enterprise Edition into a single codebase. The current
gitlab-ce repository will become a read-only mirror, without any
proprietary code. All development is moved to the current
gitlab-ee repository, which we will rename to just gitlab in the
coming weeks. As part of this migration, issues will be moved to the
current gitlab-ee project.
If you have any questions about all of this, please ask them in our
dedicated FAQ issue.
Using "gitlab" and "gitlab-ce" would be confusing, so we decided to
rename gitlab-ce to gitlab-foss to make the purpose of this FOSS
repository more clear
I created a merge requests for CE, and this got closed. What do I
need to do?
Everything in the ee/ directory is proprietary. Everything else is
free and open source software. If your merge request does not change
anything in the ee/ directory, the process of contributing changes
is the same as when using the gitlab-ce repository.
Will you accept merge requests on the gitlab-ce/gitlab-foss project
after it has been renamed?
No. Merge requests submitted to this project will be closed automatically.
Will I still be able to view old issues and merge requests in
gitlab-ce/gitlab-foss?
Yes.
How will this affect users of GitLab CE using Omnibus?
No changes will be necessary, as the packages built remain the same.
How will this affect users of GitLab CE that build from source?
Once the project has been renamed, you will need to change your Git
remotes to use this new URL. GitLab will take care of redirecting Git
operations so there is no hard deadline, but we recommend doing this
as soon as the projects have been renamed.
Where can I see a timeline of the remaining steps?