Proposal: deferred contract updates

Introduction

Currently data uploaded to a host must be immediately added to a contract. The contract is locked during the whole period data is being uploaded. This means it is not possible either to upload multiple sectors to the same contract in parallel, or do any other contract modification during the time data is being uploaded.

The goal of this proposal is to optimize renter-host protocol by separating in time uploading data and updating contract.

Description of Request

Introduce temporary key-value storage on the host outside of the contract. The renter can allocate key-value storage capacity on the host and upload key-value pairs to it, then refer to the value (or its part) using its key from CopyFrom RPC (as 3rd option, see KVReference structure in proposed RPCs for CopyFrom). The renter has full CRUD functionality on the storage: (1) create a key, (2) read the value of the key, (3) update the value of the key, and (4) delete the key. The space of keys is shared among all renters, so one renter can read a key created by another renter (not sure we need it right now, but the same approach is implemented for contract sectors so lets do it the same way). The renter can list all his keys stored on the token and check if the key exists in the store.

Storage of data in the key-value store is charged based on amount of used storage and time. The price is the same as charged in normal storage (e.g. 10 coins for GB/month). The renter needs to top up a token (the same approach as in prepaid downloads). He puts certain number of bytes/second on the token. Then he specifies the token when he uploads, updates or removes records on the key-value store. The host keeps track of remaining storage capacity on the token and removes all associated key-values when (if) it reaches zero. Reading from the key-value store requires the token for prepaid downloads and is charged in the same way (downloaded X bytes => decrease token balance by X). The same charging model is used for listing the keys. There is also another token (or "currency" on a token if we go with this approach - having the same token for everything with multiple currencies: upload, store, download) for uploading. Its unit is byte and it also needs to be top up based on the price of upload for the host. Its balance is decreased when a record is created or updated.

Each key in the key-value store is owned by a token. Only this token is allowed to remove or update the record. Any token can read it, if it knows the key.

Implementation

Host

The host can use any local key-value store (e.g. leveldb) to store the key-value pairs. Each value must also include owning token. Another storage (also leveldb probably) stores the map from token to the list of all its keys, total used storage in bytes, last update timestamp and balance at that time. It is easy to derive expiration time from this information. In-memory priority queue is used to trigger removal of keys when expiration time is reached.

Renter (relayer)

When new data comes in, it is immediately uploaded into key-value store in a host and also stored locally (as backup). When enough data to form a sector is uploaded, the sector is created using CopyFrom and the data is removed from key-value store.

Microsectors

Small files are stored in one or multiple consecutive 16K microsectors aligned by 16K in normal 4M sectors. When such files are uploaded, they are put into key-value stores (maybe of multiple hosts as backup). When total size of such microsectors reaches 4M, a sector is created.

High throughput upload

Another interesting scenario when the temp storage is useful is uploading tons of data. Data is uploaded from multiple machines (maybe even from multiple locations) and put into key-value storage. All uploaders send keys of uploaded data to the coordinator machine which updates the contract with CopyFrom calls (sending keys as sources of data) and removes the keys from the key-value store.

Edited Jul 16, 2019 by Boris Nagaev