Lease-based repository state
This is an idea in support of info/refs caching.
Context
The goal is to have a key per repository which represents its state. When retrieving a value from the info/refs cache, we look it up by this key. This allows us to have an immutable and relatively "dumb" blob cache. E.g. if the key is yMjAwMFowZjE
then we would look up a cached info/refs blob by blob key hashedKey("repo.git", "v1", "info/refs", "yMjAwMFowZjE")
. If no blob is found for that hashed key we know we have to create a new one. The blob store would use time-based expiry.
Design
Each repository gets a state directory state
. Doesn't matter where it is as long as it's in the storage root. Could be repo.git/gitaly/state
. What matters is that all Gitaly nodes that access this repository via NFS can find the state directory.
The state contains a file state/key
. This key contains a random nonce. Ongoing writes take a lease by creating a randomly named (empty) file in the state/pending
directory.
Readers
A reader looks up the key as follows:
- get directory entries of
state/pending
- if there are pending entries older than one hour: update
state/key
to new random value, delete old pending entries - open file
state/key
- if missing, create a new
state/key
with random contents - return contents of
state/key
Writers
A writer has a critical section, during which we know an update is happening. At the start of the critical section the writer creates a randomly named file state/pending/572gasfg0
; this constitutes a lease.
At the end of the critical section the writer first updates state/key
to a new random value. Then it deletes state/pending/572gasfg0
. The order of these two events is important.
Discussion
- If a writer crashes and fails to remove its pending state file, this will be detected and fixed by readers within 1 hour.
- Updates of
state/key
can use "last write wins" mechanisms because its contents are random. - It does not matter in which order concurrent writers update
state/key
, because all that matters is that the value changes, and we use randomness to "guarantee" that. - As a fallback, the "housekeeping" button in project settings should clear state.