Skip to content

Residual temp packfiles are wasting terrabytes of storage

Problem

Residual temp packfiles are still wasting significant disk space on Gitaly nodes.

Background

A year ago in gitlab-com/gl-infra/scalability#2547 we discovered that git receive-pack lacks signal handlers to appropriately clean up temp packfiles from an aborted git push. Consequently, when Gitaly's PostReceivePack or SSHReceivePack gRPC methods get Canceled (for any of several possible reasons), this tends to leave their gRPC payload (a partial packfile) on-disk in a residual temp directory ($GIT_DIR/objects/tmp_objdir-incoming-XXXXXX).

Those residual temp packfiles ought to eventually be cleaned up by Gitaly's housekeeping background jobs, but we found some examples of repos that have months of slowly accumulated cruft.

This same pathology seems likely to affect self-managed customers, and it may be hard to discover and quantify this wasted storage space.

For context, we are now close to switching gitlab.com's Gitaly nodes to use WAL-logged transactions, which should completely avoid accumulating this type of hidden bloat.

However, some corrective actions may still be prudent; hence this issue.

Scope

In this issue, let's decide if and how we should act on the following:

  • Solve the existing accumulation of bloat on our SaaS platform's Gitaly nodes (for gitlab.com and Dedicated).
  • Document advice for self-managed customers on how they can estimate this kind of bloat and optionally resolve it.
  • Determine why automated background housekeeping is not effective for at least some repos (example: gitlab-com/gl-infra/scalability#2547 (comment 2266439741)).

Illustrative example

As of today (2024-12-18), the following single project (project id 58425881) currently wastes 3.2 TB of storage on residual temp pack files. That project is not unique, but it is the fastest growing project this week. (See gitlab-com/gl-infra/scalability#2547 (comment 2266439741) for discovery notes.)

This project has over 1500 residual temp packfiles, accumulated steadily over the last 6 months, and they sum to 3.2 TB:

msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo find "$GIT_DIR/objects" -type f -path "*/tmp_objdir-incoming-*/*" | wc -l
1568

msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo bash -c "du -shxc $GIT_DIR/objects/tmp_objdir-incoming*" | sort -hr | head -n5
3.2T	total
4.9G	/var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-X8SGRJ
4.9G	/var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-tu7c6K
4.9G	/var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-Qa42F8
4.9G	/var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-O3KrUF

Running git prune would clean these up, as shown in this dry-run:

msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo -i -u git /opt/gitlab/embedded/bin/git --git-dir $GIT_DIR prune --verbose --expire='1 day ago' --dry-run | wc -l
1547

msmiley@gitaly-04-stor-gprd.c.gitlab-gitaly-gprd-d1a2.internal:~$ sudo -i -u git /opt/gitlab/embedded/bin/git --git-dir $GIT_DIR prune --verbose --expire='1 day ago' --dry-run | head -n5
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-subiSU
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-GMffXG
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-b1oaD6
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-VZZ5H5
Removing stale temporary directory /var/opt/gitlab/git-data/repositories/@hashed/e6/48/e648182fffe17e74990fcd80f2a2cf1268d47dae1a09e6f006c6b2af04923c68.git/objects/tmp_objdir-incoming-lF8wfv
Edited by Matt Smiley
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information