Skip to content
  • Michael Haggerty's avatar
    refs: introduce an iterator interface · 3bc581b9
    Michael Haggerty authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    Currently, the API for iterating over references is via a family of
    for_each_ref()-type functions that invoke a callback function for each
    selected reference. All of these eventually call do_for_each_ref(),
    which knows how to do one thing: iterate in parallel through two
    ref_caches, one for loose and one for packed refs, giving loose
    references precedence over packed refs. This is rather complicated code,
    and is quite specialized to the files backend. It also requires callers
    to encapsulate their work into a callback function, which often means
    that they have to define and use a "cb_data" struct to manage their
    context.
    
    The current design is already bursting at the seams, and will become
    even more awkward in the upcoming world of multiple reference storage
    backends:
    
    * Per-worktree vs. shared references are currently handled via a kludge
      in git_path() rather than iterating over each part of the reference
      namespace separately and merging the results. This kludge will cease
      to work when we have multiple reference storage backends.
    
    * The current scheme is inflexible. What if we sometimes want to bypass
      the ref_cache, or use it only for packed or only for loose refs? What
      if we want to store symbolic refs in one type of storage backend and
      non-symbolic ones in another?
    
    In the future, each reference backend will need to define its own way of
    iterating over references. The crux of the problem with the current
    design is that it is impossible to compose for_each_ref()-style
    iterations, because the flow of control is owned by the for_each_ref()
    function. There is nothing that a caller can do but iterate through all
    references in a single burst, so there is no way for it to interleave
    references from multiple backends and present the result to the rest of
    the world as a single compound backend.
    
    This commit introduces a new iteration primitive for references: a
    ref_iterator. A ref_iterator is a polymorphic object that a reference
    storage backend can be asked to instantiate. There are three functions
    that can be applied to a ref_iterator:
    
    * ref_iterator_advance(): move to the next reference in the iteration
    * ref_iterator_abort(): end the iteration before it is exhausted
    * ref_iterator_peel(): peel the reference currently being looked at
    
    Iterating using a ref_iterator leaves the flow of control in the hands
    of the caller, which means that ref_iterators from multiple
    sources (e.g., loose and packed refs) can be composed and presented to
    the world as a single compound ref_iterator.
    
    It also means that the backend code for implementing reference iteration
    will sometimes be more complicated. For example, the
    cache_ref_iterator (which iterates over a ref_cache) can't use the C
    stack to recurse; instead, it must manage its own stack internally as
    explicit data structures. There is also a lot of boilerplate connected
    with object-oriented programming in C.
    
    Eventually, end-user callers will be able to be written in a more
    natural way—managing their own flow of control rather than having to
    work via callbacks. Since there will only be a few reference backends
    but there are many consumers of this API, this is a good tradeoff.
    
    More importantly, we gain composability, and especially the possibility
    of writing interchangeable parts that can work with any ref_iterator.
    
    For example, merge_ref_iterator implements a generic way of merging the
    contents of any two ref_iterators. It is used to merge loose + packed
    refs as part of the implementation of the files_ref_iterator. But it
    will also be possible to use it to merge other pairs of reference
    sources (e.g., per-worktree vs. shared refs).
    
    Another example is prefix_ref_iterator, which can be used to trim a
    prefix off the front of reference names before presenting them to the
    caller (e.g., "refs/heads/master" -> "master").
    
    In this patch, we introduce the iterator abstraction and many utilities,
    and implement a reference iterator for the files ref storage backend.
    (I've written several other obvious utilities, for example a generic way
    to filter references being iterated over. These will probably be useful
    in the future. But they are not needed for this patch series, so I am
    not including them at this time.)
    
    In a moment we will rewrite do_for_each_ref() to work via reference
    iterators (allowing some special-purpose code to be discarded), and do
    something similar for reflogs. In future patch series, we will expose
    the ref_iterator abstraction in the public refs API so that callers can
    use it directly.
    
    Implementation note: I tried abstracting this a layer further to allow
    generic iterators (over arbitrary types of objects) and generic
    utilities like a generic merge_iterator. But the implementation in C was
    very cumbersome, involving (in my opinion) too much boilerplate and too
    much unsafe casting, some of which would have had to be done on the
    caller side. However, I did put a few iterator-related constants in a
    top-level header file, iterator.h, as they will be useful in a moment to
    implement iteration over directory trees and possibly other types of
    iterators in the future.
    
    Signed-off-by: default avatarRamsay Jones <ramsay@ramsayjones.plus.com>
    Signed-off-by: default avatarMichael Haggerty <mhagger@alum.mit.edu>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    3bc581b9