Limit memory usage by git-pack-objects processes on Gitaly nodes
As one of the two main corrective actions for incident production#2457 (closed) we should try to reduce the memory usage of the git pack-objects
child process that gets spawned by git upload-pack
.
Background
When Gitaly is handling a client's git fetch
(or clone
, pull
, etc.), it spawns a git upload-pack
process to find the relevant git objects, compose them into a git pack file, and stream them via STDOUT to Gitaly, which passes the response data back up the service stack. By default, git pack-objects
does not attempt to constrain its memory usage. For large git repos, the git pack-objects
process's memory usage often grows to multiple gigabytes. This can lead to potentially significant memory pressure on the Gitaly host. During the incident, the Linux kernel's out-of-memory-killer had to step in to kill some of these memory-intensive processes, and even under normal conditions, these processes frequently cause large variations in memory usage for GitLab.com's Gitaly fleet.
Tuning memory usage
UPDATE: Adjusting pack.windowMemory
did not help.
Git supports a configuration setting to tune the maximum memory usage by pack-objects
subcommand:
From the Git documentation, this setting constrains the per-thread limit of each git pack-objects
process.
We will need to test the effects of this setting before applying this config change in production. In particular, observe its effects when interacting with a repo that has pre-existing large pack files.
Get feedback from the Gitaly engineers about this. If this works well in staging and production for GitLab.com, it may make sense to make this setting configurable via omnibus, so self-hosted GitLab users can benefit too.