Proposal: Seed-based Snapshot Recovery

Seed-based Snapshot Recovery

Overview

v1.4.0 added "snapshot" backups, allowing a set of .sia files to be grouped and archived as a unit. In v1.4.1, we will allow renters to upload these snapshots to hosts and later retrieve them. This will enable a renter to recover a full snapshot of their .sia files using only their seed.

Introduction

The new renter-host protocol gives renters the ability to download sectors based on their index, not just their Merkle root. In addition, the new protocol allows the renter and host to trustlessly negotiate arbitrary modifications to contract data, such that the renter can securely modify arbitrary contract data.

Combined, these abilities enable us (the renter) to treat host storage much like a generic block device supporting random-access reads and writes. We can leverage this to impose a schema on our contract data, whereby the position of a sector within the contract is significant. For the purposes of snapshots specifically, we will store a "snapshot table" in the sector at index 0 within the contract. We can then read this sector to access existing snapshots and overwrite it to add new snapshots.

Storage Format

The sector at index 0 will store a lookup table for the snapshots residing on the host. This table is a slice of structs, defined as:

type snapshotEntry struct {
    Name           [80]byte
    UID            [16]byte
    CreationDate   uint64      // Unix timestamp
    Size           uint64      // size of snapshot file
    DataSectors    [4][32]byte // pointers to sectors containing snapshot .sia file
}

Snapshots are uniquely identified by their UID, which is chosen randomly. Each snapshot also provides its creation date, which allows renters to sort snapshots by age. For personalization, snapshots also permit a name, which can be up to 80 bytes.

The most important field, though, is DataSectors. This field references up to four other sectors on the host, that, when concatenated (and trimmed according to Size), form a .sia file. This .sia file, in turn, can be downloaded to retrieve the actual snapshot file. The snapshot is then unpacked and processed as usual.

Each snapshotEntry is 240 bytes; this means that the index-0 sector can store about 17,500 entries. Each entry's snapshot .sia file can be up to 16 MiB, which implies a bound of ~80GB for the snapshot file itself. This file, in turn, can reference about 800 TB of actual data.

Snapshot Retrieval

To access the snapshots, the renter first downloads the index-0 sector and parses it to learn the metadata of each snapshot. It then selects the desired snapshot and downloads the sectors referenced by DataSectors. These sectors are then concatenated and trimmed according to Size, producing the snapshot .sia file. This file can then downloaded like any other .sia file to retrieve the actual snapshot.

Snapshot Storage

We first create the snapshot and write to disk within the Sia data directory. This file is then uploaded like any other file, producing a .sia metadata file. Unlike other .sia files, this file is not tracked within the renter; instead, it is in turn uploaded to each host. The final step is to update the snapshot table. We first download the table using the index-0 sector root, parse it, and append our new snapshotEntry, referencing the sectors on the host where the .sia file was stored. (If the table is already full, we instead overwrite the oldest entry.) We then atomically replace the existing table on the host using an Append-Swap-Trim RPC. This ensures that the snapshot table always maintains a consistent view of available snapshots, and never contains references to incomplete snapshots. It is important that we hold the contract lock across these last two steps to guard against concurrent table updates by another renter.

Repairing

Originally, we expected that snapshots would need to be repaired like other files. However, it was then observed that the snapshots themselves contain .sia files which would also require repair. Since it is not possible to repair these .sia files (as snapshots are immutable), it seems acceptable to forgo repairs of snapshots as well. In other words, even if we did repair the snapshots, their contents would still remain unrepaired, and would gradually become unrecoverable.

We should make this limitation clear to users. Specifically, we need to manage expectations around how long snapshots will be recoverable. The degree to which snapshots remain recoverable is goverened by the degree of host churn; if few repairs are necessary, snapshots may remain recoverable for a long time. Conversely, if a user's contracts all change within the span of a day, all their previous snapshots will become unrecoverable.

It's worth noting that this deficiency will no longer apply once we are doing continuous file backups. Since snapshots are merely an interim measure between now and then, I believe this deficiency is acceptable.

Other Considerations

It may be possible to optimize the table update via some combination of partial downloads and partial updates. However, this step constitutes a small fraction of the total time required to create a snapshot, so optimizing it is not an immediate priority.

We should also consider sorting the entries by creation date. This would allow renters who are only interested in the most recent snapshot to access that snapshot directly without needing to parse the snapshot table.

For the initial release, the snapshot upload process will be stateless: it must be completed in a single run of siad. If the process is interrupted, the renter must restart it from scratch. This means that any sectors uploading during the previous attempt will become "garbage," as they are no longer referenced anywhere. A future release may improve this by writing the in-progress state to the local renter disk.

Existing renters do not treat the index-0 sector specially; as a result, regular file data will be present in this sector when we deploy the new snapshot code. We therefore need an initialization step, wherein the renter uploads an empty snapshot table and swaps it into index 0.

In the future, we may want to use numerically-indexed sectors for purposes other than snapshot backups. This would necessitate storing additional metadata alongside the snapshot table. I see three options. First, we could assign a distinct purpose to each index-n sector; that is, index 0 would store the snapshot table, while index 1 could store the metadata for some other purposes, and likewise with index 2, 3, etc. Second, we could attempt to use index 0 for all special purposes, defining some extensible structure that allows the snapshot table to live alongside other metadata within the index 0 sector. Third, we could apply another level of indirection, and store only the root of the snapshot table sector in the index-0 sector; this can be seen as a "compressed" version of the 2nd option, leaving much more space in index 0 for other purposes.

Edited Apr 09, 2019 by Luke Champine

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information