[WIP] Proposal: new RPCs and efficient tiny files storage
TODO: complete, edit. @starius
There was historically a problem that Sia handled tiny files in inefficient way. To address this problem, and also for something we call Deferred Contracts Updates Storage, we need to implement two new RPCs. In order to resolve tiny files issue, we divide 4MB sectors into 256 16KB Microsectors. Tiny files can then occupy from 1 to any number of these microsectors. There is an issue though that when some of microsectors get released (there could be multiple reasons for that, for example deletion of tiny file), we need to do defragmentation, i.e. we need to reuse this space efficiently. One way is to wait until there is new tiny file which is small enough to fit into this existing space, another way is to set up a condition (for example, when number of free microsectors is more than 50% of a sector), and do defragmentation when this condition occurs. Actually we take two of this 50%-free sectors and one new empty sector, and just copy the data to this new sector, so the data now takes 100% of its size, and we saved one sector size. Unfortunately there is no efficient way to do this copy operation in Sia currently. We would have to download and re-upload the data. To solve this, we need our 2 new RPCs mentioned above, and the goal is to implement them:
HashMicrosectors([]struct{SectorID, microsectorSize int}) ([]struct{[]hashMicrosectors})
This RPC will be used to verify microsectors. microsectorSize specifies fixed size of microsectors (we don't want to hardcode 16KB). It takes slice of sectors ID along with their microsectors' size, and returns slice of slices of microsectors hashes. microsectorSize is a power of 2 from 64 (crypto.SegmentSize) to 4M (modules.SectorSize). Returned hashes of microsectors are the same hashes from Merkle Tree that are on the level of given microsector size in the tree.
CopyFrom(*ModWriteRequest) (LoopWriteResponse)
*ModWriteRequest = WriteRequest, with different Action struct. Our Action is same but only with Update/Append. Instead of data, it's either data, or LoopReadRequestSection, or key with (offset, length combination) in our key-value for the Deferred Contracts Updates storage. If CopyFrom gets request to return hashes of microsectors, it returns them before reading signature from renter. CopyFrom is used to copy sectors without having to download/reupload them.
Here's Golang code that describes new RPCs precisely: https://play.golang.org/p/u1fYMyd2Ran