Skip to content
  • Sami Hiltunen's avatar
    Implement write-ahead logging for objects · f8f624a5
    Sami Hiltunen authored
    All writes must be write-ahead logged prior to being applied to
    the repository. Objects are currently not being logged, which can
    be a source of inconsistencies and also performance problems. This commit
    implements logging support for objects that are needed by a transaction.
    
    Objects can be included in a transaction by storing them in the tranasction's
    quarantine directory. When the transaction is being committed, the
    TransactionManager computes a pack file from the new reference tips set in
    the transaction.
    
    The pack file includes objects that are unreachable from the current
    set of references, so it includes both new objects from the quarantine
    directory and objects that are already present in the repository but are
    unreachable. This ensures that the pack file contains all objects that
    are needed to go from the current set of references to the new set of
    references after the transaction. This is important as the unreachable
    objects needed could be otherwise pruned, leading to the pack file no
    longer applying to the repository.
    
    As objects always flow through the log, this also means that only commited
    objects end up in the repository. This is an important property for backups.
    The repository will get into a consistent state by applying the write-ahead
    log. If objects could end up in the repository without being logged, some
    logged reference changes could fail once a repository is being recovered
    from a snapshot + log as neither the snapshot nor the log would be guaranteed
    to include the objects being newly referenced in a log entry.
    
    For replicated setups later, the fact that only committed objects end up
    in the repository means that all replicas are guaranteed to have received
    the same objects at some point. If objects from failed writes could end up
    in the repository, the leader could have a different set of objects from the
    replicas due to these objects which are not replicated. As the pack files are
    computed to include also unreachable objects, the pack file is guaranteed to
    apply on another replica regardless if it has garbage collected the objects.
    
    The pack files will apply even if the unreachable objects are pruned while
    they are sitting in the log. However, the current approach is not enough if
    there are concurrent transactions there is nothing holding on to old tips of
    references the pack file was computed against. This will be fixed in a follow
    up by maintaining internal references to the old tips of references until
    all dependent pack files have been applied.
    
    The pack file computation is computationally expensive but should be behaviorally
    correct. This is an iteration for now that allows us to proceed. We'll later
    need to update the approach to a less computationally heavy one, for example
    by just packing the quarantined objects and holding internal references to the
    objects the pack file depends on in the repository.
    f8f624a5