Skip to content
Snippets Groups Projects
  1. Sep 09, 2021
  2. Sep 03, 2021
    • Sami Hiltunen's avatar
      Workaround Rails filesystem ID tests in Praefect · 2e30806c
      Sami Hiltunen authored
      Rails tests configure Praefect in front of the tests that exercise
      the Rugged direct git access code. As Praefect is now deriving the
      filesystem IDs from the names of the virtual storages, the filesystem
      id checks fail and thus the test fail. This is not a problem in practice
      as one wouldn't use rugged in a real-world setup with Praefect. This
      commit worksaround the tests by returning the filesystem ID from the
      Gitaly node if a virtual storage has only one Gitaly node configured.
      This matches the setup the tests use and thus pass them. The workaround
      and the filesystem ID code can be removed in 15.0 once the rugged patches
      and NFS support are dropped.
      2e30806c
    • Sami Hiltunen's avatar
      Derive virtual storage's filesystem id from its name · add378c8
      Sami Hiltunen authored
      Gitaly storages contain a UUID filesystem ID that is generated by
      the Gitaly for each of its storages. The ID is used to determine
      which storages can be accessed by Rails directly when rugged patches
      are enabled and to see whether two different storages point to the same
      directory when doing repository moves.
      
      When repository moves are performed, the worker first checks whether the
      repository's destination and source storage are the same. If they are, the
      move is not performed. The check is performed by comparing the filesystem
      IDs of the storages'. As Praefect is currently routing the server info RPC
      to a random Gitaly node, the filesystem ID can differ between calls as each
      of the Gitalys have their own ID. This causes the repository moving worker
      to occasionally delete repositories from the virtual storage as it receives
      two different IDs on sequential calls.
      
      The filesystem ID can identify cases when two storages refer to the same
      directory on a Gitaly node as the id is stored in a file in the storage.
      This is not really possible with Praefect. The storage's are only identified
      by the virtual storage's name. If the name changes, we can't really correlate
      the ID between the different names as Praefect would consider them different
      storages. Praefect also supports multiple virtual storages so it's not possible
      to generate a single ID and use it for all of the virtual storages. Given this,
      the approach taken here is to derive a stable filesystem ID from the virtual
      storage's name. This guarantees calls to a given virtual storage always return
      the same filesystem ID.
      
      Configuring two storages that point to the same filesystem should be considered
      an invalid configuration anyway. Historically, there's been cases when that has
      been done for plain Gitalys. This is not done for Praefect and wouldn't work as
      Praefect wouldn't find the repositories with an alternative virtual storage name.
      With that in mind, we don't have to consider the case where two virtual storages
      of different names point to the same backing Gitaly storages.
      
      The use cases for the filesystem ID seem to be limited and we may be able to
      remove it in the future once the rugged patches are removed.
      
      Changelog: fixed
      add378c8
  3. Sep 02, 2021
  4. Sep 01, 2021
    • Patrick Steinhardt's avatar
      coordinator: Only schedule replication for differing error states · ed5ab9bb
      Patrick Steinhardt authored
      When finalizing a transaction, we always schedule replication jobs in
      case the primary has returned an error. Given that there are many RPCs
      which are expected to return errors in a controlled way, e.g. if a
      commit is missing, this causes us to create replication in many contexts
      where it's not necessary at all.
      
      Thinking about the issue, what we really care for is not whether an RPC
      failed or not. It's that primary and secondary nodes behaved the same.
      If both primary and secondaries succeeded, we're good. But if both
      failed with the same error, then we're good to as long as all
      transactions have been committed: quorum was reached on all votes and
      nodes failed in the same way, so we can assume that nodes did indeed
      perform the same changes.
      
      This commit thus relaxes the error condition to not schedule replication
      jobs anymore in case the primary failed, but to only schedule
      replication jobs to any node which has a different error than the
      primary. This has both the advantage that we only need to selectively
      schedule jobs for disagreeing nodes instead of targeting all
      secondaries and it avoids scheduling jobs in many cases where we do hit
      errors.
      
      Changelog: performance
      (cherry picked from commit 73839029)
      ed5ab9bb
  5. Aug 31, 2021
  6. Aug 17, 2021
  7. Aug 03, 2021
  8. Aug 02, 2021
  9. Jul 28, 2021
  10. Jul 22, 2021
  11. Jul 21, 2021
  12. Jul 20, 2021
  13. Jul 15, 2021
  14. Jul 14, 2021
  15. Jul 13, 2021
  16. Jul 12, 2021
Loading