1. 17 Jan, 2019 1 commit
    • Derrick Stolee's avatar
      list-objects: consume sparse tree walk · 4f6d26b1
      Derrick Stolee authored
      When creating a pack-file using 'git pack-objects --revs' we provide
      a list of interesting and uninteresting commits. For example, a push
      operation would make the local topic branch be interesting and the
      known remote refs as uninteresting. We want to discover the set of
      new objects to send to the server as a thin pack.
      
      We walk these commits until we discover a frontier of commits such
      that every commit walk starting at interesting commits ends in a root
      commit or unintersting commit. We then need to discover which
      non-commit objects are reachable from  uninteresting commits. This
      commit walk is not changing during this series.
      
      The mark_edges_uninteresting() method in list-objects.c iterates on
      the commit list and does the following:
      
      * If the commit is UNINTERSTING, then mark its root tree and every
        object it can reach as UNINTERESTING.
      
      * If the commit is interesting, then mark the root tree of every
        UNINTERSTING parent (and all objects that tree can reach) as
        UNINTERSTING.
      
      At the very end, we repeat the process on every commit directly
      given to the revision walk from stdin. This helps ensure we properly
      cover shallow commits that otherwise were not included in the
      frontier.
      
      The logic to recursively follow trees is in the
      mark_tree_uninteresting() method in revision.c. The algorithm avoids
      duplicate work by not recursing into trees that are already marked
      UNINTERSTING.
      
      Add a new 'sparse' option to the mark_edges_uninteresting() method
      that performs this logic in a slightly different way. As we iterate
      over the commits, we add all of the root trees to an oidset. Then,
      call mark_trees_uninteresting_sparse() on that oidset. Note that we
      include interesting trees in this process. The current implementation
      of mark_trees_unintersting_sparse() will walk the same trees as
      the old logic, but this will be replaced in a later change.
      
      Add a '--sparse' flag in 'git pack-objects' to call this new logic.
      Add a new test script t/t5322-pack-objects-sparse.sh that tests this
      option. The tests currently demonstrate that the resulting object
      list is the same as the old algorithm. This includes a case where
      both algorithms pack an object that is not needed by a remote due to
      limits on the explored set of trees. When the sparse algorithm is
      changed in a later commit, we will add a test that demonstrates a
      change of behavior in some cases.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      4f6d26b1
  2. 15 Aug, 2018 1 commit
  3. 22 Nov, 2017 1 commit
    • Jeff Hostetler's avatar
      list-objects: filter objects in traverse_commit_list · 25ec7bca
      Jeff Hostetler authored
      Create traverse_commit_list_filtered() and add filtering
      interface to allow certain objects to be omitted from the
      traversal.
      
      Update traverse_commit_list() to be a wrapper for the above
      with a null filter to minimize the number of callers that
      needed to be changed.
      
      Object filtering will be used in a future commit by rev-list
      and pack-objects for partial clone and fetch to omit unwanted
      objects from the result.
      
      traverse_bitmap_commit_list() does not work with filtering.
      If a packfile bitmap is present, it will not be used.  It
      should be possible to extend such support in the future (at
      least to simple filters that do not require object pathnames),
      but that is beyond the scope of this patch series.
      Signed-off-by: 's avatarJeff Hostetler <jeffhost@microsoft.com>
      Reviewed-by: 's avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      25ec7bca
  4. 16 Mar, 2016 2 commits
    • Jeff King's avatar
      list-objects: pass full pathname to callbacks · 2824e184
      Jeff King authored
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      2824e184
    • Jeff King's avatar
      list-objects: drop name_path entirely · dc06dc88
      Jeff King authored
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      dc06dc88
  5. 12 Feb, 2016 2 commits
    • Jeff King's avatar
      list-objects: pass full pathname to callbacks · de1e67d0
      Jeff King authored
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      de1e67d0
    • Jeff King's avatar
      list-objects: drop name_path entirely · bd64516a
      Jeff King authored
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: 's avatarJeff King <peff@peff.net>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      bd64516a
  6. 28 Aug, 2013 1 commit
  7. 01 Sep, 2011 1 commit
    • Junio C Hamano's avatar
      list-objects: pass callback data to show_objects() · 49473672
      Junio C Hamano authored
      The traverse_commit_list() API takes two callback functions, one to show
      commit objects, and the other to show other kinds of objects. Even though
      the former has a callback data parameter, so that the callback does not
      have to rely on global state, the latter does not.
      
      Give the show_objects() callback the same callback data parameter.
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      49473672
  8. 13 Apr, 2009 2 commits
    • Linus Torvalds's avatar
      process_{tree,blob}: show objects without buffering · 8d2dfc49
      Linus Torvalds authored
      Here's a less trivial thing, and slightly more dubious one.
      
      I was looking at that "struct object_array objects", and wondering why we
      do that. I have honestly totally forgotten. Why not just call the "show()"
      function as we encounter the objects? Rather than add the objects to the
      object_array, and then at the very end going through the array and doing a
      'show' on all, just do things more incrementally.
      
      Now, there are possible downsides to this:
      
       - the "buffer using object_array" _can_ in theory result in at least
         better I-cache usage (two tight loops rather than one more spread out
         one). I don't think this is a real issue, but in theory..
      
       - this _does_ change the order of the objects printed. Instead of doing a
         "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
         over the commits (which puts all the root trees _first_ in the object
         list, this patch just adds them to the list of pending objects, and
         then we'll traverse them in that order (and thus show each root tree
         object together with the objects we discover under it)
      
         I _think_ the new ordering actually makes more sense, but the object
         ordering is actually a subtle thing when it comes to packing
         efficiency, so any change in order is going to have implications for
         packing. Good or bad, I dunno.
      
       - There may be some reason why we did it that odd way with the object
         array, that I have simply forgotten.
      
      Anyway, now that we don't buffer up the objects before showing them
      that may actually result in lower memory usage during that whole
      traverse_commit_list() phase.
      
      This is seriously not very deeply tested. It makes sense to me, it seems
      to pass all the tests, it looks ok, but...
      
      Does anybody remember why we did that "object_array" thing? It used to be
      an "object_list" a long long time ago, but got changed into the array due
      to better memory usage patterns (those linked lists of obejcts are
      horrible from a memory allocation standpoint). But I wonder why we didn't
      do this back then. Maybe there's a reason for it.
      
      Or maybe there _used_ to be a reason, and no longer is.
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      8d2dfc49
    • Linus Torvalds's avatar
      show_object(): push path_name() call further down · cf2ab916
      Linus Torvalds authored
      In particular, pushing the "path_name()" call _into_ the show() function
      would seem to allow
      
       - more clarity into who "owns" the name (ie now when we free the name in
         the show_object callback, it's because we generated it ourselves by
         calling path_name())
      
       - not calling path_name() at all, either because we don't care about the
         name in the first place, or because we are actually happy walking the
         linked list of "struct name_path *" and the last component.
      
      Now, I didn't do that latter optimization, because it would require some
      more coding, but especially looking at "builtin-pack-objects.c", we really
      don't even want the whole pathname, we really would be better off with the
      list of path components.
      
      Why? We use that name for two things:
       - add_preferred_base_object(), which actually _wants_ to traverse the
         path, and now does it by looking for '/' characters!
       - for 'name_hash()', which only cares about the last 16 characters of a
         name, so again, generating the full name seems to be just unnecessary
         work.
      
      Anyway, so I didn't look any closer at those things, but it did convince
      me that the "show_object()" calling convention was crazy, and we're
      actually better off doing _less_ in list-objects.c, and giving people
      access to the internal data structures so that they can decide whether
      they want to generate a path-name or not.
      
      This patch does that, and then for people who did use the name (even if
      they might do something more clever in the future), it just does the
      straightforward "name = path_name(path, component); .. free(name);" thing.
      Signed-off-by: 's avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      cf2ab916
  9. 08 Apr, 2009 1 commit
    • Christian Couder's avatar
      list-objects: add "void *data" parameter to show functions · 11c211fa
      Christian Couder authored
      The goal of this patch is to get rid of the "static struct rev_info
      revs" static variable in "builtin-rev-list.c".
      
      To do that, we need to pass the revs to the "show_commit" function
      in "builtin-rev-list.c" and this in turn means that the
      "traverse_commit_list" function in "list-objects.c" must be passed
      functions pointers to functions with 2 parameters instead of one.
      
      So we have to change all the callers and all the functions passed
      to "traverse_commit_list".
      
      Anyway this makes the code more clean and more generic, so it
      should be a good thing in the long run.
      Signed-off-by: Christian Couder's avatarChristian Couder <chriscool@tuxfamily.org>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      11c211fa
  10. 07 Sep, 2006 2 commits