Lease-based repository state

This is an idea in support of info/refs caching.

Context

The goal is to have a key per repository which represents its state. When retrieving a value from the info/refs cache, we look it up by this key. This allows us to have an immutable and relatively "dumb" blob cache. E.g. if the key is yMjAwMFowZjE then we would look up a cached info/refs blob by blob key hashedKey("repo.git", "v1", "info/refs", "yMjAwMFowZjE"). If no blob is found for that hashed key we know we have to create a new one. The blob store would use time-based expiry.

Design

Each repository gets a state directory state. Doesn't matter where it is as long as it's in the storage root. Could be repo.git/gitaly/state. What matters is that all Gitaly nodes that access this repository via NFS can find the state directory.

The state contains a file state/key. This key contains a random nonce. Ongoing writes take a lease by creating a randomly named (empty) file in the state/pending directory.

Readers

A reader looks up the key as follows:

  1. get directory entries of state/pending
  2. if there are pending entries older than one hour: update state/key to new random value, delete old pending entries
  3. open file state/key
  4. if missing, create a new state/key with random contents
  5. return contents of state/key

Writers

A writer has a critical section, during which we know an update is happening. At the start of the critical section the writer creates a randomly named file state/pending/572gasfg0; this constitutes a lease.

At the end of the critical section the writer first updates state/key to a new random value. Then it deletes state/pending/572gasfg0. The order of these two events is important.

Discussion

  • If a writer crashes and fails to remove its pending state file, this will be detected and fixed by readers within 1 hour.
  • Updates of state/key can use "last write wins" mechanisms because its contents are random.
  • It does not matter in which order concurrent writers update state/key, because all that matters is that the value changes, and we use randomness to "guarantee" that.
  • As a fallback, the "housekeeping" button in project settings should clear state.
Edited by Jacob Vosmaer
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information