-
Sami Hiltunen authored
All writes must be write-ahead logged prior to being applied to the repository. Objects are currently not being logged, which can be a source of inconsistencies and also performance problems. This commit implements logging support for objects that are needed by a transaction. Objects can be included in a transaction by storing them in the tranasction's quarantine directory. When the transaction is being committed, the TransactionManager computes a pack file from the new reference tips set in the transaction. The pack file includes objects that are unreachable from the current set of references, so it includes both new objects from the quarantine directory and objects that are already present in the repository but are unreachable. This ensures that the pack file contains all objects that are needed to go from the current set of references to the new set of references after the transaction. This is important as the unreachable objects needed could be otherwise pruned, leading to the pack file no longer applying to the repository. As objects always flow through the log, this also means that only commited objects end up in the repository. This is an important property for backups. The repository will get into a consistent state by applying the write-ahead log. If objects could end up in the repository without being logged, some logged reference changes could fail once a repository is being recovered from a snapshot + log as neither the snapshot nor the log would be guaranteed to include the objects being newly referenced in a log entry. For replicated setups later, the fact that only committed objects end up in the repository means that all replicas are guaranteed to have received the same objects at some point. If objects from failed writes could end up in the repository, the leader could have a different set of objects from the replicas due to these objects which are not replicated. As the pack files are computed to include also unreachable objects, the pack file is guaranteed to apply on another replica regardless if it has garbage collected the objects. The pack files will apply even if the unreachable objects are pruned while they are sitting in the log. However, the current approach is not enough if there are concurrent transactions there is nothing holding on to old tips of references the pack file was computed against. This will be fixed in a follow up by maintaining internal references to the old tips of references until all dependent pack files have been applied. The pack file computation is computationally expensive but should be behaviorally correct. This is an iteration for now that allows us to proceed. We'll later need to update the approach to a less computationally heavy one, for example by just packing the quarantined objects and holding internal references to the objects the pack file depends on in the repository.
f8f624a5