Skip to content
  • Sami Hiltunen's avatar
    Pack only new objects and figure out dependencies in the repository · ef71f395
    Sami Hiltunen authored
    The TransactionManager is currently packing all new and unreachable
    objects into the packfile that gets committed as part of the
    transaction. This is expensive as it does a full graph walk. It also
    was only a temporary solution until we have the required functionality
    in Git to only pack the new objects, and figure out the existing
    objects a transaction depends on existing in the repository.
    
    Git v2.44 includes the necessary functionality in
    `rev-list --missing=print`. Previously it failed on missing commits, now
    it prints them out. This commit wires the changes into TransactionManager
    by doing the following.
    
    When a transaction is about to commit, the TransactionManager does an
    object walk in the quarantine directory only. It starts the walk from
    the new reference tips in the transaction and the included objects. The
    included objects are objects that are not reachable from the references
    but we want to commit them into the repository nonetheless. This walks
    the new objects in the quarantine and prints all of them out for packing.
    As we perform the walk with only the quarantine configured, any objects
    that are not in the quarantine are printed as missing objects. These objects
    are considered the transaction's dependencies and they are verified to
    exist in the repository prior to committing the transaction. The dependency
    information can also be later used to prevent concurrent pruning operations
    from pruning unreachable objects that are needed by some transactions.
    
    This approach is significantly faster than the earlier approach as it
    only walks and packs the new objects in the quarantine, and thus scales by
    the size of the change, not by the size of the history.
    
    Checking object existence requires a Git repository. The checks are done against
    the staging repository which is a snapshot of the repository's latest state in the
    partition plus the possible alternate configured for it during the transaction.
    If the repository is being created and has no existing state, an empty repository
    is used as the base state. The staging repository is used for both verifying
    the dependency existence and the reference changes.
    
    Git special cases the empty tree OID and considers it to be present in the
    object database even if it hasn't explicitly been written there. This leads
    to the object walk returning the empty tree OID which results it being
    included in the pack. The tests were adjusted for this. In practice this
    shouldn't have a big impact but we can later special case this and drop
    the empty tree from the packs if necessary.
    
    As transactions now require functionality in Git patched in bundled Git
    but not yet release, we add version checks in main and TransactionManager
    tests to guard against running with incorrect version.
    ef71f395