Stop rewriting packs when committing a transaction
TransactionManager
is rewriting all objects a transaction introduces into a single pack that ultimately gets committed. This is done for a few reasons:
- We previously didn't support committing multiple files. As part of the changes in Log physical operations instead of logical (#5793 - closed), we now support committing an arbitrary number of files. We could easily commit a single pack, or multiple packs now.
- We didn't have the functionality in place to walk the new objects in the quarantine only to easily verify all of them, and figure out their dependencies. Record packfile dependencies of a transaction (#5770 - closed) is introducing this.
- To simplify, we only commit packed objects, not loose objects.
While the rewriting works fine, it may result in unnecessary rewriting of packs. Larger the size of the objects in the push, the bigger the problem. It achieves as two things:
- It sheds unreachable objects that ultimately shouldn't be needed in the repository.
- It ensures we have the dependencies of all objects as we walk all of the reachable objects that end up in the pack.
Shedding unreachable objects from the quarantine is not really necessary but more of a side effect. While it's nice not to commit unnecessary objects, it's not a property we have to provide as the writers could just stop writing them.
Ensuring all objects are have their dependencies met is necessary though. This can be achieved by just walking the objects, there's no need to rewrite them.
Let's address the inefficiency of rewriting objects by:
- Packing all loose objects into pack to stick to the property that we only have packed objects.
- Walking all objects in the quarantine directory, and recording their dependencies. This is enough to ensure the dependencies are met without the rewriting.
- Commit all of the packfiles introduced by the transaction, ie. the packs it wrote and the pack containing the loose objects we packed.
The end result should be faster commits due to less redundant object rewriting. We'd also commit all written objects without shedding the unreachable ones though this is more of a side-effect than a property we need.