Local git clone sizes started growing massively after update to GitLab 16.5.0
Support Request for the Gitaly Team
Summary
- The size of the .git folder in repos cloned from GitLab have started growing massively
-
.git/objects/pack
folder was littered with dozens of pack files sized around 30 GB. - A
git fetch
pack should just be the size of the delta between the remote and the local. It seems like the GitLab server is sending a massive amount of redundant objects. - These excessively large fetches sometimes even lead to timeouts, as shown in the example below:
Output of one such git fetch
$ git fetch git@git.motivesys.com:m-files/m-files.git "+topic/zero-downtime-server-upgrade-phase2:topic/zero-downtime-server-upgrade-phase2" -v -v -v
Enter passphrase for key '/c/Users/oski.kervinen/.ssh/id_rsa':
Marking 30441e5543b3b8b1a0050e991ba3b1f42acabc44 as complete
Marking 2c1d65c66e0bc19b57e86a321636139fc2206ab1 as complete
Marking d5f31561a2d5c63750b53feb9221d61a58a3514b as complete
Marking 1f9c6d9a8f5021e051ff6de2e8b213ac2fa2654e as complete
Marking 51d6619fc371b6697f9a173eeb1a7d2f48c3a23a as complete
already have 053827667df8f0bcf0bbe8108ca420227fb52380 (refs/heads/topic/zero-downtime-server-upgrade-phase2)
want 07923c323b4849b5ac9a12aaa11bcf0e1ac9805a (refs/tags/builds/topic/T-168446-BeforeLoginToVault_event_not_triggering/23.12.13230.1)
want 6b3eb77d54ba5de7b12a796613f2a1b2bf66a0c4 (refs/tags/builds/topic/mfserver-telemetry-poc-1/23.12.13211.5)
want 4e0c0021648a48474e9a8cee1c18e1ce7d8a0d6f (refs/tags/builds/vNext/topic/86285-bug-fix/23.12.13202.43)
want b3aafd55b51c6d4690788d68b4778aaf9af16fda (refs/tags/builds/vNext/topic/ui-tests-update-integration-tests-framework-30-10-2023/23.12.13202.42)
remote: Enumerating objects: 1398865, done.
remote: Counting objects: 100% (609888/609888), done.
remote: Compressing objects: 100% (122290/122290), done.
remote: g objects: 38% (544322/1398865), 13.56 GiB | 10.62 MiB/s
remote: ========================================================================
remote:
remote: rpc error: code = DeadlineExceeded desc = running upload-pack: waiting for negotiation: context canceled
remote:
remote: ========================================================================
- Occurs both in full clones, as well as shallow clones and partial (--filter=blob) clones.
- Occurs constantly on their CI environments, but less consistently on developer machines
- Affecting at least 2 repositories, which are notoriously larger than the rest
- Project has 800MB+ blobs, and does not use LFS
- Can be consistently reproduced on a paused Runner
- Began after the update of GitLab server to 16.5.0 (Couldn't find a large pack file older than Oct 27. Their update was on Oct 26)
Customer Information
Salesforce Link:
Zendesk Ticket:
Installation Size:
- Max users: 400
Architecture Information:
- Single Omnibus installation
- Runners Windows based
Support Request
To start with:
-
Is there a way to get the precise git commands gitlab-runner is issuing for these fetches and clones? We'd like to intercept the incoming .pack files to see what's being transmitted in them. It looks like the script is now piped into the powershell process via stdin, so there is no temp file to spy.
-
How does the fetch process work on the GitLab side? It is implemented using Gitaly, not a native git process, correct? Can I access/activate extra logs? Could you point me to the relevant spot in the gitaly sources?
Severity
Severity 3 - High priority
Problem Description
The .git
folder in repos cloned from GitLab have started growing massively after an upgrade to 16.5.0. It affects both Runners and local development machines. The disk space is being taken up by dozens of pack files as huge as 30GB, while the repository itself is about 44GB big after garbage collection.
The size of such a pack should just be the size of the delta between the remote and the local. It seems like the GitLab is sending a massive amount of redundant objects, which git then dutifully stores in the pack file, but we need to investigate further to know for sure.
Troubleshooting Performed
- Checked that the problem can be reproduced on various
git
versions - Reproduced the issue using a paused runner and interrupting the
fetch
operation. Running a subsequent fetch causes GitLab to send over gigabytes of data, instead of just the delta shown on the output ofgit rev-list
. - Checked that the multi-pack-index files are present in the repository folder:
@pools/e7/f6/e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683.git/objects/pack$ ls -lSh
@pools/e7/f6/e7f6c011776e8db7cd330b54174fd76f7d0216b612387a5ffcfb81e6f0919683.git/objects/pack$ ls -lSh
total 46G
-r--r--r-- 1 git git 45G Nov 15 10:25 pack-74cb64c513736000b42fbd62da7ce652c1f7c368.pack
-rw-r--r-- 1 git git 112M Nov 10 16:39 multi-pack-index
-r--r--r-- 1 git git 101M Nov 10 16:39 pack-74cb64c513736000b42fbd62da7ce652c1f7c368.idx
-r--r--r-- 1 git git 15M Nov 15 08:46 pack-d8a1dd4ac8aa0f51001c46cfe3a1f53b8fec07a6.pack
-r--r--r-- 1 git git 15M Nov 10 16:40 multi-pack-index-cc970134468d880dafa287ec6b5070833dd986f8.bitmap
-r--r--r-- 1 git git 12M Nov 10 16:39 pack-74cb64c513736000b42fbd62da7ce652c1f7c368.rev
-r--r--r-- 1 git git 102K Nov 14 12:24 pack-d8a1dd4ac8aa0f51001c46cfe3a1f53b8fec07a6.idx
-r--r--r-- 1 git git 15K Nov 14 12:24 pack-d8a1dd4ac8aa0f51001c46cfe3a1f53b8fec07a6.rev
What specifically do you need from the Gitaly team
We can start by providing this user with answers for their requests, which should help them investigate in parallel.
Author Checklist
-
Customer information provided -
Severity realistically set -
Clearly articulated what is needed from the Gitaly team to support your request by filling out the What specifically do you need from the Gitaly team
/cc @mjwood @andrashorvath @jcaigitlab @john.mcdonnell @gerardo