Implementation Plan: Garbage Collection / Repairable Snapshots
Overview
This document is meant to lay out a full implementation plan which gets us to garbage collection and (since they are a bit intertwined) repairable snapshots. Each of the following sections is intended to be at least one MR.
1. Copy backups to /backups/
The first step is to create a new /backups/<backupname>
a.k.a the backup folder folder every time we create a backup. The folder itself contains all of the siafiles that go into the backup. So basically the contents of /home/user
.
The backup folder also contains a .info
file which serves 2 purposes.
- it replaces the
renter.persist.UploadedBackups
field since we can now store theUploadedBackup
struct in the backup folder directly - it is written after the backup was created successfully and is therefore an indication of creating a backup successfully. This allows us to loop over the backup folders on startup and delete the ones without the
.info
file since they weren't created successfully.
2. Implement a mechanism to sync the /backups folder with the network.
The refcounter code depends on the /backups folder to always be up-to-date with the uploaded backups on the network. That means we need to guarantee that this is the case.
To achieve that, we need a method that downloads all the snapshots table entries and checks if a folder already exists for each one of them by comparing the UID of the snapshot to the UID from the .info
file of the corresponding backup folder. If a folder is missing, we download the full snapshot and create the folder. If we have a folder locally that doesn't exist on the network, we delete the local folder.
Notes:
- need a timestamp in the renter that tells us if/when the sync has happened.
- we can't start this process unless we are synced and have recovered most of our contracts
- consider running this sync every day or so. It should be cheap when we are already synced
3. Add code to update refcounters
Every time we Add, Delete or backup files we need to update the refcounters accordingly. This doesn't need to be 100% accurate since we are going to recompute refcounters periodically.
4. Implement a mechanism to recompute all refcounters
Thanks to step 2, this should be a lot easier since we don't need to bother with downloading backups. All we need to do is loop over the /home/user and /backups folders one contract at a time and update the refcounters to match the state of the filesystem.
Notes:
- can't start this process before the backup sync has completed successfully at least once
- consider running this every day or every couple of days. This is rather expensive so it shouldn't happen too often.
5. Update backups to contain compressed files and upload them as a skyfile.
As of now, a backup is a tar that contains a bunch of .sia files. The backup is uploaded which results in a new .sia file. That file is uploaded to a sector and the sector root is written to the snapshot table. Unfortunately this doesn't let us repair backups which means they degrade over time.
Instead we need to change the archive itself to contain compressed .sia files. This means it only contains the roots of the uploaded pieces and the essential metadata. This makes the file recovery a bit more involved, but it means we can repair the files as long as we don't update them. The other change we make is to upload the archive using Skynet instead of a regular upload. That way the snapshot table will move to being a table of skylinks instead of sector roots.
The recovery process will then change to downloading a backup and converting the compressed files to full files by asking the hosts we have contracts with if they know the roots.
6. Have the repair loop cover the /backups folder
All that's left now is to enable repairs for .sia files in the /backups folder to make sure we keep repairing backups. This might already be the case but we should double check
7. Implement deletion for backups.
Since we can't just delete a backup folder since it will be readded by the syncing code, we need to let the sync code know that it's deleted. I'd suggest extending the .info
file with a deleted
flag and to delete all files from the backup folder except for the .info
file. That way we stop repairing the files and we can display to the user that the backup was deleted. When the syncing code tries to sync a deleted backup, it checks the folder first and if it sees a valid .info
file it knows that there is nothing to do cause it assumes that it was synced successfully anyway.
8. Implement force deletion for files
Sometimes it might be useful to delete a single file completely which includes removing it from all backups by no longer repairing it. I don't have an efficient solution to that yet though as a file might have a different name/location between backups. Only the UID would remain the same but scanning all files for all backups would be excessive.