Skip to content

Snapshot isolate transactions

Sami Hiltunen requested to merge smh-snapshot-isolate-txn into master

Transactions are currently not isolated from each other. Concurrent transactions see the changes of another transaction as soon as it commits. They can also see a half-way state where the committed changes have been only partially applied. This leads to problems as the repository may unexpectedly change when a transaction is reading it. As transactions are not isolated from each others change's, a number of tasks like taking a consistent backup or calculating a checksum for the repository's contents become impossible without blocking access of writes of other transactions. Blocking other transactions would lead to bad performance.

This commits solves the problem by snapshot isolating transactions. Each transaction is given its own snapshot of the repository to work on. This snapshot remains isolated from changes committed in to the actual repository and changes made to snapshots of other transactions. Each transaction can execute as if it was the only transaction running without considering other concurrent operations.

The repository is snapshotted by copying the directory structure and linking its files into their respective locations. This creates an independent clone of the repository without copying the large files like objects. The snapshot is built into the transaction's staging directory and gets removed with it when the transaction either commits or is rolled back.

The snapshot repository is also fully writable. Git always creates a new file when updating references and objects. This ensures the hard linked files in the snapshot remain unchanged. The changes performed in the snapshot are thrown away unless they are included in the transaction explicitly. As the snapshots only live during the transactions, it's not synced to the disk. The actual committed changes will be synced as usual with the log entry.

RewriteRepository is updated to point the repository to the snapshot in addition to setting up the quarantine. This makes it easy to integrate the snapshots by just rewriting the target repository.

Transaction's snapshot gets decided when it is beginning. The snapshot is guaranteed to include all data that was committed prior to the Begin() call. The log application is synchronized with the transactions waiting for their snapshots. Snapshots waiting for the currently applied LSN are allowed to snapshot before applying more log entries.

Transaction verification is blocked while snaphots are being taken. This is necessary as the verification process is still performed on the repository on the disk which may create lock files that should not end up in the snapshot. This can be improved later.

Merge request reports