Residual temp packfiles are wasting terrabytes of storage
Problem
Residual temp packfiles are still wasting significant disk space on Gitaly nodes.
Background
A year ago in gitlab-com/gl-infra/scalability#2547 we discovered that git receive-pack
lacks signal handlers to appropriately clean up temp packfiles from an aborted git push
. Consequently, when Gitaly's PostReceivePack
or SSHReceivePack
gRPC methods get Canceled
(for any of several possible reasons), this tends to leave their gRPC payload (a partial packfile) on-disk in a residual temp directory ($GIT_DIR/objects/tmp_objdir-incoming-XXXXXX
).
Those residual temp packfiles ought to eventually be cleaned up by Gitaly's housekeeping background jobs, but we found some examples of repos that have months of slowly accumulated cruft.
This same pathology seems likely to affect self-managed customers, and it may be hard to discover and quantify this wasted storage space.
For context, we are now close to switching gitlab.com's Gitaly nodes to use WAL-logged transactions, which should completely avoid accumulating this type of hidden bloat.
However, some corrective actions may still be prudent; hence this issue.
Scope
In this issue, let's decide if and how we should act on the following:
- Solve the existing accumulation of bloat on our SaaS platform's Gitaly nodes (for gitlab.com and Dedicated).
- Document advice for self-managed customers on how they can estimate this kind of bloat and optionally resolve it.
- Determine why automated background housekeeping is not effective for at least some repos (example: gitlab-com/gl-infra/scalability#2547 (comment 2266439741)).
Illustrative example
As of today (2024-12-18), the following single project (project id 58425881
) currently wastes 3.2 TB of storage on residual temp pack files. That project is not unique, but it is the fastest growing project this week. (See gitlab-com/gl-infra/scalability#2547 (comment 2266439741) for discovery notes.)
This project has over 1500 residual temp packfiles, accumulated steadily over the last 6 months, and they sum to 3.2 TB:
msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo find "$GIT_DIR/objects" -type f -path "*/tmp_objdir-incoming-*/*" | wc -l
1568
msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo bash -c "du -shxc $GIT_DIR/objects/tmp_objdir-incoming*" | sort -hr | head -n5
3.2T total
4.9G /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-X8SGRJ
4.9G /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-tu7c6K
4.9G /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-Qa42F8
4.9G /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-O3KrUF
Running git prune
would clean these up, as shown in this dry-run:
msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo -i -u git /opt/gitlab/embedded/bin/git --git-dir $GIT_DIR prune --verbose --expire='1 day ago' --dry-run | wc -l
1547
msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo -i -u git /opt/gitlab/embedded/bin/git --git-dir $GIT_DIR prune --verbose --expire='1 day ago' --dry-run | head -n5
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-subiSU
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-GMffXG
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-b1oaD6
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-VZZ5H5
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-lF8wfv