Proposal: Seed-based Snapshot Recovery
Seed-based Snapshot Recovery
Overview
v1.4.0 added "snapshot" backups, allowing a set of .sia files to be grouped and archived as a unit. In v1.4.1, we will allow renters to upload these snapshots to hosts and later retrieve them. This will enable a renter to recover a full snapshot of their .sia files using only their seed.
Introduction
The new renter-host protocol gives renters the ability to download sectors based on their index, not just their Merkle root. In addition, the new protocol allows the renter and host to trustlessly negotiate arbitrary modifications to contract data, such that the renter can securely modify arbitrary contract data.
Combined, these abilities enable us (the renter) to treat host storage much like a generic block device supporting random-access reads and writes. We can leverage this to impose a schema on our contract data, whereby the position of a sector within the contract is significant. For the purposes of snapshots specifically, we will store a "snapshot table" in the sector at index 0 within the contract. We can then read this sector to access existing snapshots and overwrite it to add new snapshots.
Storage Format
The sector at index 0 will store a lookup table for the snapshots residing on the host. This table is a slice of structs, defined as:
type snapshotEntry struct {
Name [80]byte
UID [16]byte
CreationDate uint64 // Unix timestamp
Size uint64 // size of snapshot file
DataSectors [4][32]byte // pointers to sectors containing snapshot .sia file
}
Snapshots are uniquely identified by their UID, which is chosen randomly. Each snapshot also provides its creation date, which allows renters to sort snapshots by age. For personalization, snapshots also permit a name, which can be up to 80 bytes.
The most important field, though, is DataSectors
. This field references up to
four other sectors on the host, that, when concatenated (and trimmed according
to Size
), form a .sia file. This .sia file, in turn, can be downloaded to
retrieve the actual snapshot file. The snapshot is then unpacked and processed
as usual.
Each snapshotEntry
is 240 bytes; this means that the index-0 sector can store
about 17,500 entries. Each entry's snapshot .sia file can be up to 16 MiB, which
implies a bound of ~80GB for the snapshot file itself. This file, in turn, can
reference about 800 TB of actual data.
Snapshot Retrieval
To access the snapshots, the renter first downloads the index-0 sector and
parses it to learn the metadata of each snapshot. It then selects the desired
snapshot and downloads the sectors referenced by DataSectors
. These sectors
are then concatenated and trimmed according to Size
, producing the snapshot
.sia file. This file can then downloaded like any other .sia file to retrieve
the actual snapshot.
Snapshot Storage
We first create the snapshot and write to disk within the Sia data directory.
This file is then uploaded like any other file, producing a .sia metadata file.
Unlike other .sia files, this file is not tracked within the renter; instead, it
is in turn uploaded to each host. The final step is to update the snapshot
table. We first download the table using the index-0 sector root, parse it, and
append our new snapshotEntry
, referencing the sectors on the host where the
.sia file was stored. (If the table is already full, we instead overwrite the
oldest entry.) We then atomically replace the existing table on the host using
an Append-Swap-Trim RPC. This ensures that the snapshot table always maintains a
consistent view of available snapshots, and never contains references to
incomplete snapshots. It is important that we hold the contract lock across
these last two steps to guard against concurrent table updates by another
renter.
Repairing
Originally, we expected that snapshots would need to be repaired like other files. However, it was then observed that the snapshots themselves contain .sia files which would also require repair. Since it is not possible to repair these .sia files (as snapshots are immutable), it seems acceptable to forgo repairs of snapshots as well. In other words, even if we did repair the snapshots, their contents would still remain unrepaired, and would gradually become unrecoverable.
We should make this limitation clear to users. Specifically, we need to manage expectations around how long snapshots will be recoverable. The degree to which snapshots remain recoverable is goverened by the degree of host churn; if few repairs are necessary, snapshots may remain recoverable for a long time. Conversely, if a user's contracts all change within the span of a day, all their previous snapshots will become unrecoverable.
It's worth noting that this deficiency will no longer apply once we are doing continuous file backups. Since snapshots are merely an interim measure between now and then, I believe this deficiency is acceptable.
Other Considerations
It may be possible to optimize the table update via some combination of partial downloads and partial updates. However, this step constitutes a small fraction of the total time required to create a snapshot, so optimizing it is not an immediate priority.
We should also consider sorting the entries by creation date. This would allow renters who are only interested in the most recent snapshot to access that snapshot directly without needing to parse the snapshot table.
For the initial release, the snapshot upload process will be stateless: it must
be completed in a single run of siad
. If the process is interrupted, the
renter must restart it from scratch. This means that any sectors uploading
during the previous attempt will become "garbage," as they are no longer
referenced anywhere. A future release may improve this by writing the
in-progress state to the local renter disk.
Existing renters do not treat the index-0 sector specially; as a result, regular file data will be present in this sector when we deploy the new snapshot code. We therefore need an initialization step, wherein the renter uploads an empty snapshot table and swaps it into index 0.
In the future, we may want to use numerically-indexed sectors for purposes other than snapshot backups. This would necessitate storing additional metadata alongside the snapshot table. I see three options. First, we could assign a distinct purpose to each index-n sector; that is, index 0 would store the snapshot table, while index 1 could store the metadata for some other purposes, and likewise with index 2, 3, etc. Second, we could attempt to use index 0 for all special purposes, defining some extensible structure that allows the snapshot table to live alongside other metadata within the index 0 sector. Third, we could apply another level of indirection, and store only the root of the snapshot table sector in the index-0 sector; this can be seen as a "compressed" version of the 2nd option, leaving much more space in index 0 for other purposes.