Design: Partial uploads
Partial Uploads / Partial Chunk Support
This issue is meant as a starting point to discuss the design for partial uploads and will be modified over time. It will also be extended with more details as the implementation begins and new issues/questions arise.
What are partial uploads?
Partial uploads are uploads to the Sia network which are smaller than a single chunk. Currently we add padding to the end of a file if its size is less than the size of a full chunk, which is 40 MiB for an erasure coding with 10 data pieces. (10 pieces * 4 MiB sector size)
This is not ideal since the padding is data the user has to pay for. e.g. uploading 40 1 MiB files would cost the user as much as uploading 1600 MiB of data (at default redundancy) even though only 40 MiB were actually uploaded.
Implementation
Add support for partial chunks to the siafile
The siafile needs to be extended to allow for special pieces which contain an index and offset instead of a merkle root. Ideally we would be able to reuse the existing implementation with minor tweaks. One suggestion is to simply assume that the last chunk of a siafile is a partial one since a file should never need more than one. This might not be true if we allow for appending data to a file.
e.g. a 10 MiB file would have a single partial chunk. If the user adds 70 more MiB to the file, it would consist of 1 10MiB partial chunk, 1 30 Mib partial chunk and a 40 MiB full chunk unless we reupload the partial chunks as a full chunk.
EDIT: After a talk with @DavidVorick we think it would be better to go with Luke's suggestion in the comments. Siafiles will be updated from
piece struct {
HostTableOffset uint32 // offset of the host's key within the pubKeyTable
MerkleRoot crypto.Hash // merkle root of the piece
}
to
// piece represents a single piece of a chunk on disk
piece struct {
HostTableOffset uint32 // offset of the host's key within the pubKeyTable
MerkleRoot crypto.Hash // merkle root of the piece
Offset uint32 // offset within the 4MiB sector
Length uint32 // length of the piece within the sector
}
This means existing siafiles will have their pieces updated to Offset = 0
and Length = modules.SectorSize
.
Uploading the partial piece
Uploading a partial piece should be simple. It's simply appended to the partial piece section of the host. It starts to become tricky once we delete partial files since we will end up with fragmentation on the host. As a result we probably also need a defragmentation thread which moves data around on the host and frees up space.
EDIT: Since we can't modify sectors online without changing their merkle root, we need to already upload them as a combined chunk. This means we reserve a chunk in the data directory of Sia and append partial uploads to it instead of uploading them right away. Once the chunk is full, it will be uploaded to the network.
Of course this leaves 2 questions open:
- how to handle pieces with different redundancy? (multiple partial chunks on disk probably)
- when exactly to upload the chunk? e.g. What if 2 30 MiB partial chunks are supposed to be uploaded?