Proposal: Upload Streaming
This is a first draft on how upload streaming will be implemented and how we can split it up into multiple MRs to simplify the review process.
Goal
The goal is for the API to have an endpoint with which a user can stream a file of arbitrary size to Sia which uploads this file chunk-by-chunk in real time without buffering any additional data on disk.
For that to work, the repair loop needs to be able to load chunks for repairs from arbitrary readers. We also need to make sure that streamed uploads have a higher priority than regular repairs. Maybe not within the workers themselves but at least within the upload loop.
We also need to extend the SiaFile
to be able to grow in size.
Roadmap
The following is a suggestion on how to split up the work that needs to be done.
-
Change the existing repair code to use Reader
s to repair files -
Add an AppendChunk
method to theSiaFile
to grow the.sia
file by a chunk -
Add the actual upload streaming endpoint -
Handling for interrupted streams (optional)
Open questions
How do we handle interrupted streams?
A stream could suddenly be interrupted for various reasons. That will almost always leave the file in a state where the last couple of chunks have a redundancy < MinRedundancy. We don't really need to handle that and could leave it to the user to delete that file and try again later. At least for the initial release of the feature.
A more sophisticated solution would be to have a repair
endpoint. It takes a file-offset as an argument, a siapath and a stream of data. The node would then skip the stream for chunks which are already fully repaired, repair the bad ones and append the rest of the data at the end of the file.
The nice thing about that is, that it would also work for files which haven't been uploaded using the upload-streaming endpoint.