Skip to content

Draft: Fail cache downloads when required, retry on memory alloc fail

Ryan Castro requested to merge rcastro6-fail-cache-downloads-when-required into main

What does this MR do?

One of our clients created a ticket Internal link with a request to add some updates to improve how GitLab handles cache extraction failures.

This MR was already created on GitHub, but I'm recreating it here for the customer for review and action.

This fix:

Detects memory errors Retries memory allocation errors 3 times Waits 1s and triggers a GC on memory error

Why was this MR needed?

The extract code unzips all of the files, iterating the archive in a loop. Inspecting the Golang flate library, there does not appear to be pooling between the decompressors, eg. they each allocate their own memory. It looks like they try to use fairly small buffers, but its not clear how much they might store at a given time. In any case, with an archive that may have 100K files, this loop will create these things quickly and GC may get behind.

Merge request reports