Skip to content

Pack only new objects and figure out dependencies in the repository

Sami Hiltunen requested to merge smh-record-packfile-deps into master

The TransactionManager is currently packing all new and unreachable objects into the packfile that gets committed as part of the transaction. This is expensive as it does a full graph walk. It also was only a temporary solution until we have the required functionality in Git to only pack the new objects, and figure out the existing objects a transaction depends on existing in the repository.

Git v2.44 includes the necessary functionality in rev-list --missing=print. Previously it failed on missing commits, now it prints them out. This commit wires the changes into TransactionManager by doing the following.

When a transaction is about to commit, the TransactionManager does an object walk in the quarantine directory only. It starts the walk from the new reference tips in the transaction and the included objects. The included objects are objects that are not reachable from the references but we want to commit them into the repository nonetheless. This walks the new objects in the quarantine and prints all of them out for packing. As we perform the walk with only the quarantine configured, any objects that are not in the quarantine are printed as missing objects. These objects are considered the transaction's dependencies and they are verified to exist in the repository prior to committing the transaction. The dependency information can also be later used to prevent concurrent pruning operations from pruning unreachable objects that are needed by some transactions.

This approach is significantly faster than the earlier approach as it only walks and packs the new objects in the quarantine, and thus scales by the size of the change, not by the size of the history.

Closes #5770 (closed)

Edited by Sami Hiltunen

Merge request reports