WIP: Partial Uploads
Terminology
- partial chunk = a chunk that is smaller than a full chunkSize
- combined chunk = a chunk consisting of multiple partial chunks belonging to different siafiles
- partialsSiaFile = a SiaFile that contains all combined chunks for a specific erasure coder
Overview
The way partial uploads work right now is by having partialsSiaFile
s which regular SiaFile
s have a pointer to. For every unique erasure code setting used by the user, a new partialsSiaFile
with the extension .csia
will be created.
These files won't be tracked by the repair loops directly. Instead regular methods like AddPiece
will pass calls through to the corresponding partialsSiaFile
when called on the regular SiaFile
.
Another new file type is the .partial
file. If a file has a partial chunk at the end, this partial chunk will be stored in a .partial
file in parallel to the .sia
file. Once included in a combinedChunk
, the .partial
file will be deleted and the combinedChunk
is persisted instead.
Open Questions / Design choices
- Where to put partial chunks? Right now they are saved as binary blobs next to the .sia file
- Where to put combined chunks? Right now it's a
.combined_chunks
folder in the renter dir - How to minimize worst-case scenario "waste"? e.g. if the user only uploads 89% chunks.
- How to figure out which combined chunks are no longer useful?
- How to prune not-useful combined chunks from the
partialsSiaFile
? - Considering that 99.99% of all files have partial chunks and share the same
partialsSiaFile
, are we worried about disk i/o bottlenecks on that file?
For discussing the design choices, let's open a new discussion on this MR for each choice to keep track of it. It's not as convenient as Discord but since discussions have to be resolved in order for the MR to be merged we won't lose track of anything
Edited by Matthew Sevey