Limit memory usage by git-pack-objects processes on Gitaly nodes

As one of the two main corrective actions for incident production#2457 (closed) we should try to reduce the memory usage of the git pack-objects child process that gets spawned by git upload-pack.

Background

When Gitaly is handling a client's git fetch (or clone, pull, etc.), it spawns a git upload-pack process to find the relevant git objects, compose them into a git pack file, and stream them via STDOUT to Gitaly, which passes the response data back up the service stack. By default, git pack-objects does not attempt to constrain its memory usage. For large git repos, the git pack-objects process's memory usage often grows to multiple gigabytes. This can lead to potentially significant memory pressure on the Gitaly host. During the incident, the Linux kernel's out-of-memory-killer had to step in to kill some of these memory-intensive processes, and even under normal conditions, these processes frequently cause large variations in memory usage for GitLab.com's Gitaly fleet.

Tuning memory usage

UPDATE: Adjusting pack.windowMemory did not help.

Git supports a configuration setting to tune the maximum memory usage by pack-objects subcommand:

pack.windowMemory

From the Git documentation, this setting constrains the per-thread limit of each git pack-objects process.

We will need to test the effects of this setting before applying this config change in production. In particular, observe its effects when interacting with a repo that has pre-existing large pack files.

Get feedback from the Gitaly engineers about this. If this works well in staging and production for GitLab.com, it may make sense to make this setting configurable via omnibus, so self-hosted GitLab users can benefit too.

Edited Aug 09, 2020 by Matt Smiley