Keep minimum refs in the repository
We need to keep refs number in minimum otherwise if we have a lot of refs, in the end the repository would be very slow. The only refs we really need to keep at the moment are:
- Branches (
refs/heads/**/*
) - Tags (
refs/tags/**/*
) - Merge requests (
refs/merge-requests/[MERGE_REQUEST_IID]/head
) (it's possible to replace it with refs/keep-around, but this is also a feature) - Commits which have notes (
refs/keep-around/[SHA]
) - Commits which have pipelines/jobs (
refs/keep-around/[SHA]
) - Commits for deployments (
refs/environments/[NAME]
) (it's possible to replace it with refs/keep-around)
Everything else is not really needed. We could keep a reserved list, i.e. %w[heads tags merge-requests keep-around environments]
and remove all other refs during housekeeping, or just after importing or mirroring.
On the other hand, we could also speed up importing by not creating those refs in the first place. This could be more tricky because we need to create merge requests, notes, and so on during importing, so we still somehow need to have an access to them.
Some action items:
- https://gitlab.com/gitlab-org/gitlab-ce/issues/36807 Remove non-reserved refs during housekeeping, or after importing or mirroring
-
https://gitlab.com/gitlab-org/gitlab-ce/issues/36863 Only use
refs/keep-around/*
and stop usingrefs/merge-requests/*/head
- https://gitlab.com/gitlab-org/gitlab-ce/issues/36292 Do not fetch all the refs during importing or mirroring, or find a smart way to match the remote refs to our designated refs (e.g. refs/pull/* -> refs/merge-requests/*)
-
https://gitlab.com/gitlab-org/gitlab-ce/issues/36865 Cleanup redundant
refs/keep-around/*
which would already be kept around by another ref - Remove the ref from
refs/keep-around/*
once all corresponding notes, pipelines, jobs, or merge requests are removed. - Remove the ref from
refs/merge-requests/**/head
once the merge request is merged and we're sure that we have the HEAD SHA from that merge request stored somewhere. (e.g. database) We don't need to keep the ref because the SHA should be in the tree already. However we should be careful not break anything. See https://gitlab.com/gitlab-org/gitlab-ce/issues/36516#note_37926497 for some related issue. We could also consider reuse the refs in keep-around if possible
Other things we need to consider:
- What if a force push happened in a branch? With or without a merge request?
- What if a branch/tag is deleted?
- What if a note is deleted on a particular commit?
- Could we remove ref/sha for pipelines/jobs we don't care?
- Could we remove ref/sha for deployments we don't care?
- How do we deal with deleted ref/sha for those notes, pipelines, etc, we no longer care?
See related issues regarding reducing the refs.
Edited by Lin Jen-Shin