Decouple the lifespan of the git-pack-objects process from the speed of the client network connection

Problem summary

When Gitaly spawns a git upload-pack process, its git pack-objects child process can hold a large amount of memory for many minutes. Such processes collectively contribute significantly to the overall memory pressure on Gitaly's cgroup, in extreme cases leading to memory starvation.

Details

When a client runs git clone or git fetch (or git pull), Gitaly responds by spawning a git upload-pack process to emit a git packfile containing whatever objects the client wants. git upload-pack spawns git pack-objects as a helper child process, and by default it can accumulate a potentially large amount of memory.

Here's the problem: That memory is held for as long as it takes to send the packfile data to the remote client.

When clients request a large dataset over a slow network connection, the git pack-objects process holds a large amount of memory for a long time.

To reduce the risk of memory starvation on each Gitaly node, we can:

Reduce the memory footprint for the git pack-objects process. This is being explored in https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11019
Minimize the lifespan of the git pack-objects process by decoupling it from the client, thus reducing the number of concurrent large processes competing for the limited memory budget. This is the subject of the current issue.

Ideally we would do both of the above. Point 1 addresses the size component of the problem, and point 2 addresses the time component of the problem; together they significantly significantly reduce this class of risk.

Why does the `git pack-objects` process live so long?

When the response payload (i.e. the packfile data) is sufficiently large, none of the layers in the service call chain (gitaly, workhorse, nginx, etc.) can buffer the complete response, so the origin of the data stream (the git pack-objects process) ends up stalling when the buffers between it and the client fill up. The larger the total size of the packfile data stream, the longer this effect lasts.

This pattern of of buffer saturation essentially transforms buffered network I/O into implicitly synchronized network I/O.

In more concrete terms:

The git pack-objects process writes packfile data to its STDOUT.
That is consumed by git upload-pack and re-emitted to its STDOUT, which gitaly consumes.
The response payload then passes from Gitaly up the network call chain, to gitlab-workhorse or gitlab-shell, through any proxying layers (nginx, haproxy, CDN, etc.), and finally to the client-side git fetch or git clone process.

When none of those layers can fully buffer the response payload, git pack-objects must wait for the client to drain some data and free some space in one of the buffering layers before it can wake up and continue sending more packfile data.

A clear indication that this is occurring is the presence of git pack-objects processes that spend more time sleeping than running on CPU. In the following example, PID 4406 has been running for 56 minutes and currently holds 3% of physical memory, but it has only used 2 minutes of CPU time because it spends most of its wall clock time sleeping, waiting to write more output to STDOUT. Less than 30 such processes would saturate the whole Gitaly node's memory cgroup, causing an outage.

$ pgrep -f 'git.*pack-objects' | xargs -r ps -o pid,%cpu,time,etime,%mem,s,args | cat
  PID %CPU     TIME     ELAPSED %MEM S COMMAND
 4406  4.2 00:02:22       56:27  3.4 S /opt/gitlab/embedded/libexec/git-core/git pack-objects --revs --thin --stdout --progress --delta-base-offset --include-tag
...

Is this a design gap or an infrastructure problem?

In general git pack-objects are expected to emit packfile data faster than remote clients can consume it. (Disk I/O is generally higher throughput than most clients' Internet bandwidth.) So this is a general design consideration, not an environment-specific constraint.

How can we reduce the memory usage profile?

To decouple the lifespan of the memory-intensive server-side process from the potentially slow client-side network bandwidth, we need a cheaper place to buffer the response payload.

Some options to consider:

Change nothing

Our current design choices implicitly allow git pack-objects to acquire and hold an unlimited amount of memory for an unlimited duration. That carries the risk of accidental or malicious run-away resource consumption. Incident gitlab-com/gl-infra/production#2457 (closed) shows a recent example of that occurring in production.

To improve that outcome, this issue aims to reduce the duration of that memory usage (and hence the likelihood of overlapping with concurrent consumers). The complimentary issue https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/11019 aims to reduce the size of that memory usage per active client.

Buffer to temp files

This option effectively trades memory usage for disk usage.

Spooling a large response payload to a temp file would allow the memory-hungry git pack-objects process to never become blocked and exit as quickly as possible. Under normal conditions, the temp file will still be cached by the kernel, so clients should not see a performance difference; under the adverse conditions described above, Gitaly will be less susceptible to memory starvation than it is today, improving its availability.

Such a buffer-to-disk policy could be implemented at any of several points in the call chain. Gitaly is closest to the problem, and implementing it there seems like a reasonable self-defense tactic. Another alternative is Workhorse, which is generally a more horizontally scalable service layer but may have too little spare local storage in some environments.

Buffering to a temp file may only be worthwhile for large response payloads. Also, such buffering costs disk I/O and storage space. This is generally cheaper than memory, but some environments may not have spare capacity (space and IOPS). Other environments may prefer to use a separate filesystem for scratch space (e.g. cheaper ephemeral storage). So, if we implement this feature, we may want the following config options:

Enable/disable buffering to disk. Default to false for backward compatibility.
Threshold to start spooling response data to disk. Maybe default to however big the in-memory buffer currently is, so the temp file acts as an overflow?
Directory to use for scratch space. Default to a new subdirectory on the same filesystem as the git repo storage?

Delegate buffering to a more cheaply scalable service layer

Another option is to allow in-memory buffering on a cheaper, more easily horizontally scalable service tier.

Any of the service layers mentioned above between the client and the git process could implement buffering (whether in-memory or to-disk). At least a few options may make good sense, but again, such behavior must be configurable, since different environments will have different capacity constraints, and no one solution will ideally suit all of them.

Some non-solutions

Caching packfiles does not address the concern -- even a very small number of cache-misses can trigger pathologically bad behavior.

Some CDN services offer request buffering and response buffering as a protection against slowloris abuse. Those always come with size limits, and many GitLab users do not have the luxury of such a service, so we should offer an in-product solution.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information