Skip to content
  • Quang-Minh Nguyen's avatar
    Fix a race in transaction manager log entry retention · 0c6a1788
    Quang-Minh Nguyen authored
    TransactionManager has an issue where unneeded log entries may not be
    pruned if a transaction is aborted. This is because of how current log
    entry retention works. We keep some log entries that are still referred
    by other transactions. Each log entry has a reference counter. When a
    transaction finishes, it reduces the target entry's counter and signals
    the manager to wake up to clean up stale entries via an unbuffered
    channel. There are two problems here:
    * All transactions send a signal to [this channel](https://gitlab.com/gitlab-org/gitaly/-/blob/6abde2303696a63f9eec85365b144c3b208d8043/internal/gitaly/storage/storagemgr/transaction_manager.go#L329)
      even if that
      transaction does not necessarily clean up any log entry. The list of
      keeparound entries is structured as a linked list. It removes leading
      empty entries only.
    * If the transaction signals the channel when the manager is not
      listening, the signal is ignored.
    
    The two problems could lead to the accumulation of unneeded log entries
    on the disk. Consider the following scenario:
    * Begin TX1
    * Begin TX2
    * Commit TX1 - The channel is signaled
    * The manager begins cleaning up TX1
    * Rollback TX2 - The channel is not listened by the manager. The signal
      is ignored.
    * The manager finishes cleaning up and starts processing new
      transactions. It is not aware that the log entry TX2 is based on
      should be removed.
    
    Although the log entry will be cleaned up after Gitaly restarts, the log
    entries might accumulate for days. To resolve this race, this commit
    proposes two changes:
    * Convert the signaling queue to a buffered channel
    * Only signal the channel if the transaction modifies the head of the
      in-memory list.
    
    As a result, the above scenario becomes:
    * Begin TX1
    * Begin TX2
    * Commit TX1 - The channel is signaled.
    * The manager begins cleaning up TX1. The channel is cleaned up.
    * Rollback TX2 - The channel is signaled.
    * The manager finishes cleaning up TX1.
    * It receives the last message from the buffered channel and starts to
      clean up TX2.
    
    If another transaction (TX3 for example) signals to that channel when
    it's full, it means the manager still has not picked up the signal.
    That's perfectly fine to let TX3 move on. The next time the manager
    wakes up, it scans the head of the list, so it will clean up all empty
    log entries that were referred by TX1, TX2, and TX3.
    
    If the manager crashes in between, the unneeded log entries are still
    removed when it wakes up.
    
    For #5814
    0c6a1788