Skip to content

WIP: Partial Uploads

Christopher Schinnerl requested to merge partial-uploads into master

Terminology

  • partial chunk = a chunk that is smaller than a full chunkSize
  • combined chunk = a chunk consisting of multiple partial chunks belonging to different siafiles
  • partialsSiaFile = a SiaFile that contains all combined chunks for a specific erasure coder

Overview

The way partial uploads work right now is by having partialsSiaFiles which regular SiaFiles have a pointer to. For every unique erasure code setting used by the user, a new partialsSiaFile with the extension .csia will be created. These files won't be tracked by the repair loops directly. Instead regular methods like AddPiece will pass calls through to the corresponding partialsSiaFile when called on the regular SiaFile.

Another new file type is the .partial file. If a file has a partial chunk at the end, this partial chunk will be stored in a .partial file in parallel to the .sia file. Once included in a combinedChunk, the .partial file will be deleted and the combinedChunk is persisted instead.

Open Questions / Design choices

  • Where to put partial chunks? Right now they are saved as binary blobs next to the .sia file
  • Where to put combined chunks? Right now it's a .combined_chunks folder in the renter dir
  • How to minimize worst-case scenario "waste"? e.g. if the user only uploads 89% chunks.
  • How to figure out which combined chunks are no longer useful?
  • How to prune not-useful combined chunks from the partialsSiaFile?
  • Considering that 99.99% of all files have partial chunks and share the same partialsSiaFile, are we worried about disk i/o bottlenecks on that file?

For discussing the design choices, let's open a new discussion on this MR for each choice to keep track of it. It's not as convenient as Discord but since discussions have to be resolved in order for the MR to be merged we won't lose track of anything

Edited by Matthew Sevey

Merge request reports