EOFError: multipart data over retained size limit when committing large files via Repository Files API with multipart/form-data
<!--IssueSummary start--> <details> <summary> Everyone can contribute. [Help move this issue forward](https://handbook.gitlab.com/handbook/marketing/developer-relations/contributor-success/community-contributors-workflows/#contributor-links) while earning points, leveling up and collecting rewards. </summary> - [Close this issue](https://contributors.gitlab.com/manage-issue?action=close&projectId=278964&issueIid=591892) </details> <!--IssueSummary end--> ### Summary Users encounter an `EOFError: multipart data over retained size limit` error when attempting to create or update files larger than 16MB via the Repository Files API (`POST/PUT /api/v4/projects/:id/repository/files/:file_path`) using `multipart/form-data` content type. ### Root Cause The error originates from Rack's multipart parser which has a hardcoded memory buffer limit of 16MB (`BUFFERED_UPLOAD_BYTESIZE_LIMIT`) for non-file form fields. When processing `multipart/form-data` requests: 1. Workhorse intercepts the request and saves the entire request body to a temporary file (up to 300MB) 2. Workhorse forwards metadata about the saved file to Rails 3. Rails' `file_params_from_body_upload` method re-parses the saved file using `Rack::Multipart.parse_multipart` 4. The `content` field (containing the file data to be committed) is sent as a regular form field **without a filename** 5. Rack's parser determines how to handle each field based on the presence of a filename: - With filename → `TempfilePart` → streams to disk (no memory limit) - Without filename → `BufferPart` → buffers entirely in memory 6. Since `content` has no filename, Rack buffers it in memory 7. When the `content` field exceeds 16MB, Rack's `update_retained_size()` method raises `EOFError: multipart data over retained size limit` This creates a mismatch: `CommitsUploader` allows requests up to 300MB (`DEFAULT_MAX_REQUEST_SIZE`), but Rack's internal buffer limit is only 16MB. **Relevant code path:** - `lib/api/helpers/commits_body_uploader_helper.rb:38` calls `Rack::Multipart.parse_multipart(env)` - Rack's `multipart/parser.rb:349-351` enforces the 16MB limit ### Sentry Error https://new-sentry.gitlab.net/organizations/gitlab/issues/3332849/ Backtrace: ``` EOFError: multipart data over retained size limit (EOFError) from rack/multipart/parser.rb:350:in `update_retained_size' from rack/multipart/parser.rb:336:in `handle_mime_body' from rack/multipart/parser.rb:250:in `block in run_parser' from <internal:kernel>:187:in `loop' from rack/multipart/parser.rb:241:in `run_parser' from rack/multipart/parser.rb:225:in `on_read' from rack/multipart/parser.rb:101:in `block in parse' from <internal:kernel>:187:in `loop' from rack/multipart/parser.rb:99:in `parse' from rack/multipart.rb:53:in `extract_multipart' from config/initializers/rack_multipart_patch.rb:10:in `extract_multipart' from rack/multipart.rb:41:in `parse_multipart' from lib/api/helpers/commits_body_uploader_helper.rb:38:in `file_params_from_body_upload' from lib/api/files.rb:336:in `block (2 levels) in <class:Files>' from grape/endpoint.rb:58:in `call' from grape/endpoint.rb:58:in `block (2 levels) in generate_api_method' from active_support/notifications.rb:212:in `instrument' from grape/endpoint.rb:57:in `block in generate_api_method' from grape/endpoint.rb:328:in `execute' from grape/endpoint.rb:260:in `block in run' ``` ### Possible Solutions #### Option 1: Pre-process multipart in Workhorse (Recommended) Extend Workhorse's `body_uploader.go` to detect `multipart/form-data` content type and pre-process it similarly to how `rewrite.go` handles regular multipart uploads. Workhorse would: 1. Parse the multipart body 2. Extract large fields (like `content`) to separate temporary files 3. Forward metadata about extracted fields to Rails (similar to how file uploads are handled) 4. Rails would then read large fields from files instead of parsing them from the multipart body This approach leverages Workhorse's existing multipart parsing capabilities and keeps memory usage low. - **Pros:** Memory-efficient; consistent with existing Workhorse patterns; no Rack limitations - **Cons:** Requires changes to both Workhorse and Rails; more complex implementation #### Option 2: Implement streaming multipart parser in Rails Replace `Rack::Multipart.parse_multipart` in `file_params_from_body_upload` with a custom streaming parser that writes large fields to temporary files instead of buffering in memory. - **Pros:** Memory-efficient; changes contained to Rails - **Cons:** Requires implementing/maintaining a custom multipart parser; need to ensure tempfiles are properly cleaned up #### Option 3: Recommend `application/json` for large payloads The JSON content type path already uses `Oj.load_file(file_path)` which streams from the file without loading everything into memory. We could: - Document that `application/json` should be used for files >16MB - Return a helpful error message suggesting JSON format when multipart fails due to size - **Pros:** No code changes needed for the happy path; leverages existing efficient code path - **Cons:** May break existing client integrations; requires client-side changes #### Option 4: Handle EOFError gracefully with informative error message Catch the `EOFError` and return a 400 Bad Request with a message explaining the limitation and suggesting alternatives (e.g., use `application/json` content type). - **Pros:** Better user experience; guides users to working solutions; quick to implement - **Cons:** Doesn't fix the underlying limitation for multipart users ### Recommendation **Short-term:** Implement **Option 4** to provide immediate relief with a helpful error message. **Long-term:** Implement **Option 1** (Workhorse pre-processing) for a proper fix that maintains memory efficiency and supports the full 300MB request size with `multipart/form-data`.
issue