Immediately unlink potentially-large temporary files
When a large request comes in, two different Ruby libraries may write that to disk:
- Rack::Multipart will write a file from a multipart/form-data request once the request is ready to be handled.
- Puma will always write the request body to a file if it exceeds a preset limit (around 100 KiB).
Both of these have cleanup mechanisms in place, but those typically don't survive SIGKILL (which isn't surprising, as you can't catch that). This means that for a large multipart request, we'll write the request body to disk twice!
There are two commits here.
Immediately unlink Rack::Multipart temporary files
Rack writes files from multipart/form-data requests to disk in a temporary file. Rack includes a middleware to clean these up - Rack::TempfileReaper - but that won't withstand a process being sent SIGKILL. To handle that case, we can immediately unlink the created temporary file, which means it will be removed once we're done with it or the current process goes away.
For development mode and test mode, we have to ensure that this new middleware is before Gitlab::Middleware::Static, otherwise we might not get the chance to set our own middleware.
With direct upload configured, GitLab mostly doesn't accept multipart/form-data requests in a way where they reach Rack directly - they typically go via Workhorse which accelerates them - but there are cases where it can happen, and direct upload is still only an option.
To test this manually, we can set
$GITLAB_API_TOKEN_LOCALto a personal access token for the API in the local environment,$PATH_TO_FILEto be a path to a (preferably large) file to be uploaded, and break the actual saving of uploads (in the default case with GDK, stop Minio):curl -H "Private-Token: $GITLAB_API_TOKEN_LOCAL" \ -F "file=@$PATH_TO_FILE" \ http://localhost:3000/api/v4/projects/1/uploadsOnce the upload is finished and the request fails, we'll see the file we uploaded in
$TMPDIR:$ ls -l $TMPDIR/RackMultipart* | awk '{ print $5, $8 }' 952107008 17:40With this change, that won't happen: we'll see the file created and immediately unlinked, so no matter what happens, it won't stick around on disk. (This specific test case is handled by Rack::TempfileReaper in later versions of Rack, but it still depends on manual cleanup.)
And:
Immediately unlink Puma temporary files
Puma has a limit (
Puma::Const::MAX_BODY- around 110 KiB) over which it will write request bodies to disk for handing off to the application. When it does this, the request body can be left on disk if the Puma process receives SIGKILL. Consider an extremely minimalconfig.ru:run(proc { [204, {}, []] })If we then:
- Start
puma, noting the process ID.- Start a slow file transfer, using
curl --limit-rate 100k(for example) and-T $PATH_TO_LARGE_FILE.- Watch
$TMPDIR/puma*.We will see Puma start to write this temporary file. If we then send SIGKILL to Puma, the file won't be cleaned up. With this patch, it will.
The patch itself is pretty unpleasant: as Puma has two quite long methods that set up the temporary files (
Puma::Client#setup_bodyandPuma::Client#setup_chunked_body), we have to copy those methods and call#unlinkin the correct spots in both. Also, as these are private methods, it's hard to write a test for them. We can test manually. Runningfswatch -t -x $TMPDIR | grep pumawhile posting a large file shows this with this patch:Fri Mar 26 20:34:10 2021 ... Created Removed IsFile Fri Mar 26 20:34:21 2021 ... Updated IsFileWhereas without this patch we get:
Fri Mar 26 20:32:57 2021 ... Created IsFile Fri Mar 26 20:33:05 2021 ... Created Removed Updated IsFile