Skip to content

fix(driver): write to in memory file in place

Suleimi Ahmed requested to merge 834-inmemory-mem-pre-alloc into master

Related to #834 (closed) and #819 (closed)

Problem 🚒

Related Logs

When running all tests in the package /registry/storage/driver/inmemory with a timeout of 30m (which is a very generous timeout). Occasionally, the test run appears to stall when running the particular test: TestWriteReadLargeStreams, utilizing up to 95% of the 30m allocated for the entire tests in the package! In particular the offending line that causes the test to stall is:

data := make([]byte, len(f.data), off+len(p)) here

This is suggestive of a memory allocation freeze, due to the test program being unable to allocate additional memory as a result of lack of available resources on the host of the test run.

Different operating systems react differently when handling massive memory allocation requests from a process, but one possibility (and the most likely cause for our flaky-ness in this case) is that the excessive allocation request may cause the process to hang, (for example when the host OS deems memory resource is scarce).

How much memory are we talking about here and why? 💾

On further investigation, the offending test runs the risk of requesting up to 15GB of memory at a time!

The test tries to assert that it can read and write up to 5GB of data, and it does this by:

  1. Generating 5GB worth of bytes, and then using an io reader to buffer/back the bytes --> here and here
  2. When creating the io reader (mentioned in 1.) the test also creates an io writer to use for calculating the sha of what is read from the reader on the same line --> here using io.TeeReader
  3. Next the test creates a dedicated inmemory driver (complete with its own io writer implementation) and calls io.Copy to move all the data from the 5GB "stash" to the drivers writer. This is essentially what the test is testing for and the crux of the problem --> still here
  4. The native io.Copy implementation removes data from the io.reader and writes it into the driver's io writer in chunks. Essentially, it will call the driver's Write method for every byte chunk it tries to write --> here
  5. For each byte slice that is requested to be written, the driver's Write method will the call the it's WriteAt method here, leading us to the line in question (i.e data := make([]byte, len(f.data), off+len(p))) through the function:
1  func (f *file) WriteAt(p []byte, offset int64) (n int, err error) {
2	off := int(offset)
3	if cap(f.data) < off+len(p) {
4		data := make([]byte, len(f.data), off+len(p))
5		copy(data, f.data)
6		f.data = data
7	}
8
9	f.mod = time.Now()
10	f.data = f.data[:off+len(p)]
11
12	return copy(f.data[off:off+len(p)], p), nil
13  }

Now if you take a look at line 4. above, by doing a make on the offending line we are pre allocating a byte slice of zeroes with size equal to what we already read (and stored in memory) + a few extra for the chunks that are being read at the moment from the reader (i.e the test tries to allocate 2 X the current data stored in memory + some extra bytes)

So, through this process if we were perhaps to get to a point where only the last 100 bytes of the 5GB content was left to be written by the drive to memory, we would have:

  • 5Gib - 100 bytes in the hash writer (as mentioned in point 2. above)
  • 2 * 5gib + 100 bytes by our offending line

giving the test run a total of ~15Gb by the time it reaches https://gitlab.com/gitlab-org/container-registry/-/blob/master/registry/storage/driver/inmemory/mfs.go#L298 (only to append a file of 5GB and calculate the hash).

Considerations 🤔

The discussion above affects current registry inmemory write operations and are not only scoped to the flaky test.

In fact at a given time (t) when running the registry with in memory storage and an image's blob is pushed we will have at least 2 * (the size of the image that is begin pushed - the last chunk that is being read out from the buffered input) + the last chunk that is being read out from the buffered input. allocated

*Where t denotes the time at which there is only one more buffered byte left to complete a blob push.

The rarity of the events that need to align to trigger this issue in real/non-test/non-dev operations are close to 0. Why?

I don't think anyone runs the registry in inmemory storage mode except to test or develop and even when run in the inmemory storage mode, you'd really need to push a massive image layer (i.e 5GB) and also be running low on memory on the host to trigger this issue. Running the registry as an inmemory process to handle massive uploads is really the wild west 🌵 .

Solutions 🎲

Taking into account the consideration above we have two options for a solution:

  1. Do nothing, this is a test issue and 5Gb is actually too much to handle at once anyway: I'm unsure if there's a hard requirement for the registry to be able to service up to 5Gb per write for in inmemory storage mode, but if there isn't maybe we should consider making this test less fussy about uploading such a huge payload (i.e maybe we can drop from 5GB to 1GB). This will only affect the test and nothing changes in the actual implementation.

  2. Refactor the write logic to be less memory intrusive: If we revisit the WriteAt method called specifically from Write we will recognize that all it does is an append. We can fix up the Write/WriteAt logic to be less memory intrusive, so we are not always allocating 2x whatever we already read at the line in question. This would make the test significantly less flaky and also benefit any gunslingers 🔫 running the registry with inmemory storage.

What does this MR do? 🔮

Implements a solution from 2. of the Solutions section

Benchmark

go test -benchmem -benchtime=1m -bench=. (on N cores use -cpu=n flag )

goos: darwin
goarch: arm64
BenchMark Ran Machine Cores Run count Average time Average Mem Average Alloc
BenchmarkWriteAt/size=5368709120-10 10 20 3546628015 ns/op 5369085972 B/op 1 allocs/op
BenchmarkAppends/size=5368709120-10 10 904388 110316 ns/op 203966 B/op 0 allocs/op
BenchmarkWriteAt/size=5368709120-2 2 21 3371731002 ns/op 5369102342 B/op 1 allocs/op
BenchmarkAppends/size=5368709120-2 2 790364 93995 ns/op 170129 B/op 0 allocs/op

Benchmark.go: https://go.dev/play/p/0k_DoBMz0TE

Edited by Jaime Martinez

Merge request reports