Skip to content
  • Patrick Steinhardt's avatar
    refs: introduce reftable backend · 57db2a09
    Patrick Steinhardt authored and Junio C Hamano's avatar Junio C Hamano committed
    Due to scalability issues, Shawn Pearce has originally proposed a new
    "reftable" format more than six years ago [1]. Initially, this new
    format was implemented in JGit with promising results. Around two years
    ago, we have then added the "reftable" library to the Git codebase via
    a4bbd13b (Merge branch 'hn/reftable', 2021-12-15). With this we have
    landed all the low-level code to read and write reftables. Notably
    missing though was the integration of this low-level code into the Git
    code base in the form of a new ref backend that ties all of this
    together.
    
    This gap is now finally closed by introducing a new "reftable" backend
    into the Git codebase. This new backend promises to bring some notable
    improvements to Git repositories:
    
      - It becomes possible to do truly atomic writes where either all refs
        are committed to disk or none are. This was not possible with the
        "files" backend because ref updates were split across multiple loose
        files.
    
      - The disk space required to store many refs is reduced, both compared
        to loose refs and packed-refs. This is enabled both by the reftable
        format being a binary format, which is more compact, and by prefix
        compression.
    
      - We can ignore filesystem-specific behaviour as ref names are not
        encoded via paths anymore. This means there is no need to handle
        case sensitivity on Windows systems or Unicode precomposition on
        macOS.
    
      - There is no need to rewrite the complete refdb anymore every time a
        ref is being deleted like it was the case for packed-refs. This
        means that ref deletions are now constant time instead of scaling
        linearly with the number of refs.
    
      - We can ignore file/directory conflicts so that it becomes possible
        to store both "refs/heads/foo" and "refs/heads/foo/bar".
    
      - Due to this property we can retain reflogs for deleted refs. We have
        previously been deleting reflogs together with their refs to avoid
        file/directory conflicts, which is not necessary anymore.
    
      - We can properly enumerate all refs. With the "files" backend it is
        not easily possible to distinguish between refs and non-refs because
        they may live side by side in the gitdir.
    
    Not all of these improvements are realized with the current "reftable"
    backend implementation. At this point, the new backend is supposed to be
    a drop-in replacement for the "files" backend that is used by basically
    all Git repositories nowadays. It strives for 1:1 compatibility, which
    means that a user can expect the same behaviour regardless of whether
    they use the "reftable" backend or the "files" backend for most of the
    part.
    
    Most notably, this means we artificially limit the capabilities of the
    "reftable" backend to match the limits of the "files" backend. It is not
    possible to create refs that would end up with file/directory conflicts,
    we do not retain reflogs, we perform stricter-than-necessary checks.
    This is done intentionally due to two main reasons:
    
      - It makes it significantly easier to land the "reftable" backend as
        tests behave the same. It would be tough to argue for each and every
        single test that doesn't pass with the "reftable" backend.
    
      - It ensures compatibility between repositories that use the "files"
        backend and repositories that use the "reftable" backend. Like this,
        hosters can migrate their repositories to use the "reftable" backend
        without causing issues for clients that use the "files" backend in
        their clones.
    
    It is expected that these artificial limitations may eventually go away
    in the long term.
    
    Performance-wise things very much depend on the actual workload. The
    following benchmarks compare the "files" and "reftable" backends in the
    current version:
    
      - Creating N refs in separate transactions shows that the "files"
        backend is ~50% faster. This is not surprising given that creating a
        ref only requires us to create a single loose ref. The "reftable"
        backend will also perform auto compaction on updates. In real-world
        workloads we would likely also want to perform pack loose refs,
        which would likely change the picture.
    
            Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
              Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
              Range (min … max):     1.8 ms …   4.3 ms    133 runs
    
            Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
              Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
              Range (min … max):     2.4 ms …   2.9 ms    132 runs
    
            Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
              Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
              Range (min … max):    1.969 s …  1.980 s    3 runs
    
            Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
              Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
              Range (min … max):    2.597 s …  2.622 s    3 runs
    
            Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
              Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
              Range (min … max):   198.189 s … 198.670 s    3 runs
    
            Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
              Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
              Range (min … max):   290.223 s … 298.761 s    3 runs
    
      - Creating N refs in a single transaction shows that the "files"
        backend is significantly slower once we start to write many refs.
        The "reftable" backend only needs to update two files, whereas the
        "files" backend needs to write one file per ref.
    
            Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
              Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
              Range (min … max):     1.8 ms …   2.6 ms    151 runs
    
            Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
              Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
              Range (min … max):     2.4 ms …   3.4 ms    148 runs
    
            Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
              Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
              Range (min … max):   148.5 ms … 167.8 ms    15 runs
    
            Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
              Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
              Range (min … max):    56.3 ms …  72.9 ms    40 runs
    
            Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
              Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
              Range (min … max):   152.165 s … 153.542 s    3 runs
    
            Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
              Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
              Range (min … max):   51.769 s … 52.012 s    3 runs
    
      - Deleting a ref in a fully-packed repository shows that the "files"
        backend scales with the number of refs. The "reftable" backend has
        constant-time deletions.
    
            Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
              Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
              Range (min … max):     1.6 ms …   2.1 ms    316 runs
    
            Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
              Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
              Range (min … max):     1.7 ms …   2.1 ms    294 runs
    
            Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
              Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
              Range (min … max):     1.9 ms …   2.5 ms    287 runs
    
            Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
              Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
              Range (min … max):     1.8 ms …   2.1 ms    217 runs
    
            Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
              Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
              Range (min … max):   224.6 ms … 245.2 ms    6 runs
    
            Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
              Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
              Range (min … max):     2.0 ms …   2.1 ms    3 runs
    
      - Listing all refs shows no significant advantage for either of the
        backends. The "files" backend is a bit faster, but not by a
        significant margin. When repositories are not packed the "reftable"
        backend outperforms the "files" backend because the "reftable"
        backend performs auto-compaction.
    
            Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   2.0 ms    1729 runs
    
            Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   1.8 ms    1816 runs
    
            Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
              Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
              Range (min … max):     4.1 ms …   4.6 ms    645 runs
    
            Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
              Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
              Range (min … max):     4.2 ms …   5.9 ms    643 runs
    
            Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
              Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
              Range (min … max):    2.511 s …  2.627 s    10 runs
    
            Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
              Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
              Range (min … max):    2.692 s …  2.752 s    10 runs
    
            Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   1.9 ms    1834 runs
    
            Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.4 ms …   2.0 ms    1840 runs
    
            Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
              Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
              Range (min … max):    13.3 ms …  14.5 ms    208 runs
    
            Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
              Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
              Range (min … max):     4.3 ms …   6.2 ms    624 runs
    
            Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
              Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
              Range (min … max):   11.965 s … 12.370 s    10 runs
    
            Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
              Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
              Range (min … max):    2.769 s …  2.836 s    10 runs
    
      - Printing a single ref shows no real difference between the "files"
        and "reftable" backends.
    
            Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
              Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
              Range (min … max):     1.4 ms …   1.8 ms    1779 runs
    
            Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.4 ms …   2.5 ms    1753 runs
    
            Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
              Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
              Range (min … max):     1.4 ms …   1.9 ms    1840 runs
    
            Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   2.0 ms    1831 runs
    
            Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   2.1 ms    1848 runs
    
            Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
              Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
              Range (min … max):     1.5 ms …   2.1 ms    1762 runs
    
    So overall, performance depends on the usecases. Except for many
    sequential writes the "reftable" backend is roughly on par or
    significantly faster than the "files" backend though. Given that the
    "files" backend has received 18 years of optimizations by now this can
    be seen as a win. Furthermore, we can expect that the "reftable" backend
    will grow faster over time when attention turns more towards
    optimizations.
    
    The complete test suite passes, except for those tests explicitly marked
    to require the REFFILES prerequisite. Some tests in t0610 are marked as
    failing because they depend on still-in-flight bug fixes. Tests can be
    run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
    environment variable to "reftable".
    
    There is a single known conceptual incompatibility with the dumb HTTP
    transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
    because the "HEAD" file is not valid anymore, it is impossible for the
    remote client to figure out the default branch without changing the
    protocol. This shortcoming needs to be handled in a subsequent patch
    series.
    
    As the reftable library has already been introduced a while ago, this
    commit message will not go into the details of how exactly the on-disk
    format works. Please refer to our preexisting technical documentation at
    Documentation/technical/reftable for this.
    
    [1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/
    
    
    
    Original-idea-by: default avatarShawn Pearce <spearce@spearce.org>
    Based-on-patch-by: default avatarHan-Wen Nienhuys <hanwen@google.com>
    Signed-off-by: default avatarPatrick Steinhardt <ps@pks.im>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    57db2a09