Skip to content

git: Lower big file threshold to improve memory use and diff latency

Patrick Steinhardt requested to merge pks-lower-big-file-threshold into master

Git uses a threshold at which it treats blobs as "large", which causes it to alter behaviour when handling them in multiple contexts:

- It will stop to deltify them.

- It will stop to load them into memory and instead use streaming
  interfaces.

- It will stop to compute diffs for them.

The default value for this threshold is quite high with 512MB. This may cause us to happily allocate large buffers, cause Git to try to create deltas for such files, or compute diffs for two huge blobs. All of this behaviour can be very taxing, driving up both memory consumption and wasting compute.

While have already addressed the issue of trying to deltify objects this large by setting pack.windowMemory=100m to limit the packfile window memory, we still happily soak such blobs into memory or try to compute diffs. And indeed, we have recently heard about multiple cases where customers run into timeouts when computing diffs when large files are involved.

Lower the threshold to 50MB such that we effectively limit the maximum size for blobs that we will try to diff with each other. While this has the downside that it now becomes impossible to diff such blobs anymore, we can operate under the assumption that in general, people will not have text files that are this huge. There will be exceptions like large generated structured text files, but ignoring these feels acceptable in the larger scheme where this is causing performance problems in the more general case. Furthermore, as mentioned already, this change should help us to avoid allocation of large chunks of memory and thus decrease our memory consumption somewhat.

The value of 50MB was chosen to match our pack.windowMemory=100m setting. In order to deltify an object we need to have two such objects in the window, and two times 50MB matches the maximum allowed window memory. This change should thus not caues larger packfiles in general.

Closes #4609 (closed).

Merge request reports