1. 17 Jan, 2019 1 commit
    • Derrick Stolee's avatar
      revision: add mark_tree_uninteresting_sparse · f1f5de44
      Derrick Stolee authored
      In preparation for a new algorithm that walks fewer trees when
      creating a pack from a set of revisions, create a method that
      takes an oidset of tree oids and marks reachable objects as
      UNINTERESTING.
      
      The current implementation uses the existing
      mark_tree_uninteresting to recursively walk the trees and blobs.
      This will walk the same number of trees as the old mechanism. To
      ensure that mark_tree_uninteresting walks the tree, we need to
      remove the UNINTERESTING flag before calling the method. This
      implementation will be replaced entirely in a later commit.
      
      There is one new assumption in this approach: we are also given
      the oids of the interesting trees. This implementation does not
      use those trees at the moment, but we will use them in a later
      rewrite of this method.
      Signed-off-by: Derrick Stolee's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f1f5de44
  2. 06 Dec, 2018 1 commit
  3. 02 Nov, 2018 2 commits
    • Derrick Stolee's avatar
      revision.c: generation-based topo-order algorithm · b4542418
      Derrick Stolee authored
      The current --topo-order algorithm requires walking all
      reachable commits up front, topo-sorting them, all before
      outputting the first value. This patch introduces a new
      algorithm which uses stored generation numbers to
      incrementally walk in topo-order, outputting commits as
      we go. This can dramatically reduce the computation time
      to write a fixed number of commits, such as when limiting
      with "-n <N>" or filling the first page of a pager.
      
      When running a command like 'git rev-list --topo-order HEAD',
      Git performed the following steps:
      
      1. Run limit_list(), which parses all reachable commits,
         adds them to a linked list, and distributes UNINTERESTING
         flags. If all unprocessed commits are UNINTERESTING, then
         it may terminate without walking all reachable commits.
         This does not occur if we do not specify UNINTERESTING
         commits.
      
      2. Run sort_in_topological_order(), which is an implementation
         of Kahn's algorithm. It first iterates through the entire
         set of important commits and computes the in-degree of each
         (plus one, as we use 'zero' as a special value here). Then,
         we walk the commits in priority order, adding them to the
         priority queue if and only if their in-degree is one. As
         we remove commits from this priority queue, we decrement the
         in-degree of their parents.
      
      3. While we are peeling commits for output, get_revision_1()
         uses pop_commit on the full list of commits computed by
         sort_in_topological_order().
      
      In the new algorithm, these three steps correspond to three
      different commit walks. We run these walks simultaneously,
      and advance each only as far as necessary to satisfy the
      requirements of the 'higher order' walk. We know when we can
      pause each walk by using generation numbers from the commit-
      graph feature.
      
      Recall that the generation number of a commit satisfies:
      
      * If the commit has at least one parent, then the generation
        number is one more than the maximum generation number among
        its parents.
      
      * If the commit has no parent, then the generation number is one.
      
      There are two special generation numbers:
      
      * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and
        indicates that the commit is not stored in the commit-graph and
        the generation number was not previously calculated.
      
      * GENERATION_NUMBER_ZERO: this value (0) is a special indicator
        to say that the commit-graph was generated by a version of Git
        that does not compute generation numbers (such as v2.18.0).
      
      Since we use generation_numbers_enabled() before using the new
      algorithm, we do not need to worry about GENERATION_NUMBER_ZERO.
      However, the existence of GENERATION_NUMBER_INFINITY implies the
      following weaker statement than the usual we expect from
      generation numbers:
      
          If A and B are commits with generation numbers gen(A) and
          gen(B) and gen(A) < gen(B), then A cannot reach B.
      
      Thus, we will walk in each of our stages until the "maximum
      unexpanded generation number" is strictly lower than the
      generation number of a commit we are about to use.
      
      The walks are as follows:
      
      1. EXPLORE: using the explore_queue priority queue (ordered by
         maximizing the generation number), parse each reachable
         commit until all commits in the queue have generation
         number strictly lower than needed. During this walk, update
         the UNINTERESTING flags as necessary.
      
      2. INDEGREE: using the indegree_queue priority queue (ordered
         by maximizing the generation number), add one to the in-
         degree of each parent for each commit that is walked. Since
         we walk in order of decreasing generation number, we know
         that discovering an in-degree value of 0 means the value for
         that commit was not initialized, so should be initialized to
         two. (Recall that in-degree value "1" is what we use to say a
         commit is ready for output.) As we iterate the parents of a
         commit during this walk, ensure the EXPLORE walk has walked
         beyond their generation numbers.
      
      3. TOPO: using the topo_queue priority queue (ordered based on
         the sort_order given, which could be commit-date, author-
         date, or typical topo-order which treats the queue as a LIFO
         stack), remove a commit from the queue and decrement the
         in-degree of each parent. If a parent has an in-degree of
         one, then we add it to the topo_queue. Before we decrement
         the in-degree, however, ensure the INDEGREE walk has walked
         beyond that generation number.
      
      The implementations of these walks are in the following methods:
      
      * explore_walk_step and explore_to_depth
      * indegree_walk_step and compute_indegrees_to_depth
      * next_topo_commit and expand_topo_walk
      
      These methods have some patterns that may seem strange at first,
      but they are probably carry-overs from their equivalents in
      limit_list and sort_in_topological_order.
      
      One thing that is missing from this implementation is a proper
      way to stop walking when the entire queue is UNINTERESTING, so
      this implementation is not enabled by comparisions, such as in
      'git rev-list --topo-order A..B'. This can be updated in the
      future.
      
      In my local testing, I used the following Git commands on the
      Linux repository in three modes: HEAD~1 with no commit-graph,
      HEAD~1 with a commit-graph, and HEAD with a commit-graph. This
      allows comparing the benefits we get from parsing commits from
      the commit-graph and then again the benefits we get by
      restricting the set of commits we walk.
      
      Test: git rev-list --topo-order -100 HEAD
      HEAD~1, no commit-graph: 6.80 s
      HEAD~1, w/ commit-graph: 0.77 s
        HEAD, w/ commit-graph: 0.02 s
      
      Test: git rev-list --topo-order -100 HEAD -- tools
      HEAD~1, no commit-graph: 9.63 s
      HEAD~1, w/ commit-graph: 6.06 s
        HEAD, w/ commit-graph: 0.06 s
      
      This speedup is due to a few things. First, the new generation-
      number-enabled algorithm walks commits on order of the number of
      results output (subject to some branching structure expectations).
      Since we limit to 100 results, we are running a query similar to
      filling a single page of results. Second, when specifying a path,
      we must parse the root tree object for each commit we walk. The
      previous benefits from the commit-graph are entirely from reading
      the commit-graph instead of parsing commits. Since we need to
      parse trees for the same number of commits as before, we slow
      down significantly from the non-path-based query.
      
      For the test above, I specifically selected a path that is changed
      frequently, including by merge commits. A less-frequently-changed
      path (such as 'README') has similar end-to-end time since we need
      to walk the same number of commits (before determining we do not
      have 100 hits). However, get the benefit that the output is
      presented to the user as it is discovered, much the same as a
      normal 'git log' command (no '--topo-order'). This is an improved
      user experience, even if the command has the same runtime.
      Helped-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: Derrick Stolee's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b4542418
    • Derrick Stolee's avatar
      revision.c: begin refactoring --topo-order logic · f0d9cc41
      Derrick Stolee authored
      When running 'git rev-list --topo-order' and its kin, the topo_order
      setting in struct rev_info implies the limited setting. This means
      that the following things happen during prepare_revision_walk():
      
      * revs->limited implies we run limit_list() to walk the entire
        reachable set. There are some short-cuts here, such as if we
        perform a range query like 'git rev-list COMPARE..HEAD' and we
        can stop limit_list() when all queued commits are uninteresting.
      
      * revs->topo_order implies we run sort_in_topological_order(). See
        the implementation of that method in commit.c. It implies that
        the full set of commits to order is in the given commit_list.
      
      These two methods imply that a 'git rev-list --topo-order HEAD'
      command must walk the entire reachable set of commits _twice_ before
      returning a single result.
      
      If we have a commit-graph file with generation numbers computed, then
      there is a better way. This patch introduces some necessary logic
      redirection when we are in this situation.
      
      In v2.18.0, the commit-graph file contains zero-valued bytes in the
      positions where the generation number is stored in v2.19.0 and later.
      Thus, we use generation_numbers_enabled() to check if the commit-graph
      is available and has non-zero generation numbers.
      
      When setting revs->limited only because revs->topo_order is true,
      only do so if generation numbers are not available. There is no
      reason to use the new logic as it will behave similarly when all
      generation numbers are INFINITY or ZERO.
      
      In prepare_revision_walk(), if we have revs->topo_order but not
      revs->limited, then we trigger the new logic. It breaks the logic
      into three pieces, to fit with the existing framework:
      
      1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
         struct. We use the presence of this struct as a signal to use the
         new methods during our walk. In this patch, this method simply
         calls limit_list() and sort_in_topological_order(). In the future,
         this method will set up a new data structure to perform that logic
         in-line.
      
      2. next_topo_commit() provides get_revision_1() with the next topo-
         ordered commit in the list. Currently, this simply pops the commit
         from revs->commits.
      
      3. expand_topo_walk() provides get_revision_1() with a way to signal
         walking beyond the latest commit. Currently, this calls
         add_parents_to_list() exactly like the old logic.
      
      While this commit presents method redirection for performing the
      exact same logic as before, it allows the next commit to focus only
      on the new logic.
      Signed-off-by: Derrick Stolee's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f0d9cc41
  4. 23 Oct, 2018 1 commit
    • Matthew DeVore's avatar
      exclude-promisor-objects: declare when option is allowed · 669b1d2a
      Matthew DeVore authored
      The --exclude-promisor-objects option causes some funny behavior in at
      least two commands: log and blame. It causes a BUG crash:
      
      	$ git log --exclude-promisor-objects
      	BUG: revision.c:2143: exclude_promisor_objects can only be used
      	when fetch_if_missing is 0
      	Aborted
      	[134]
      
      Fix this such that the option is treated like any other unknown option.
      The commands that must support it are limited, so declare in those
      commands that the flag is supported. In particular:
      
      	pack-objects
      	prune
      	rev-list
      
      The commands were found by searching for logic which parses
      --exclude-promisor-objects outside of revision.c. Extra logic outside of
      revision.c is needed because fetch_if_missing must be turned on before
      revision.c sees the option or it will BUG-crash. The above list is
      supported by the fact that no other command is introspectively invoked
      by another command passing --exclude-promisor-object.
      Signed-off-by: default avatarMatthew DeVore <matvore@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      669b1d2a
  5. 06 Oct, 2018 2 commits
    • Matthew DeVore's avatar
      revision: mark non-user-given objects instead · 99c9aa95
      Matthew DeVore authored
      Currently, list-objects.c incorrectly treats all root trees of commits
      as USER_GIVEN. Also, it would be easier to mark objects that are
      non-user-given instead of user-given, since the places in the code
      where we access an object through a reference are more obvious than
      the places where we access an object that was given by the user.
      
      Resolve these two problems by introducing a flag NOT_USER_GIVEN that
      marks blobs and trees that are non-user-given, replacing USER_GIVEN.
      (Only blobs and trees are marked because this mark is only used when
      filtering objects, and filtering of other types of objects is not
      supported yet.)
      
      This fixes a bug in that git rev-list behaved differently from git
      pack-objects. pack-objects would *not* filter objects given explicitly
      on the command line and rev-list would filter. This was because the two
      commands used a different function to add objects to the rev_info
      struct. This seems to have been an oversight, and pack-objects has the
      correct behavior, so I added a test to make sure that rev-list now
      behaves properly.
      Signed-off-by: default avatarMatthew DeVore <matvore@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      99c9aa95
    • Matthew DeVore's avatar
      rev-list: handle missing tree objects properly · 7c0fe330
      Matthew DeVore authored
      Previously, we assumed only blob objects could be missing. This patch
      makes rev-list handle missing trees like missing blobs. The --missing=*
      and --exclude-promisor-objects flags now work for trees as they already
      do for blobs. This is demonstrated in t6112.
      Signed-off-by: default avatarMatthew DeVore <matvore@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7c0fe330
  6. 21 Sep, 2018 2 commits
  7. 22 Aug, 2018 1 commit
    • Jeff King's avatar
      rev-list: make empty --stdin not an error · a12cbe23
      Jeff King authored
      When we originally did the series that contains 7ba82629
      (revision: add rev_input_given flag, 2017-08-02) the intent
      was that "git rev-list --stdin </dev/null" would similarly
      become a successful noop. However, an attempt at the time to
      do that did not work[1]. The problem is that rev_input_given
      serves two roles:
      
       - it tells rev-list.c that it should not error out
      
       - it tells revision.c that it should not have the "default"
         ref kick (e.g., "HEAD" in "git log")
      
      We want to trigger the former, but not the latter. This is
      technically possible with a single flag, if we set the flag
      only after revision.c's revs->def check. But this introduces
      a rather subtle ordering dependency.
      
      Instead, let's keep two flags: one to denote when we got
      actual input (which triggers both roles) and one for when we
      read stdin (which triggers only the first).
      
      This does mean a caller interested in the first role has to
      check both flags, but there's only one such caller. And any
      future callers might want to make the distinction anyway
      (e.g., if they care less about erroring out, and more about
      whether revision.c soaked up our stdin).
      
      In fact, we already keep such a flag internally in
      revision.c for this purpose, so this is really just exposing
      that to the caller (and the old function-local flag can go
      away in favor of our new one).
      
      [1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/Helped-by: default avatarJunio C Hamano <gitster@pobox.com>
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a12cbe23
  8. 15 Aug, 2018 1 commit
  9. 14 Aug, 2018 2 commits
  10. 03 Aug, 2018 1 commit
  11. 23 Jul, 2018 2 commits
  12. 09 Jul, 2018 1 commit
    • Jonathan Tan's avatar
      upload-pack: send refs' objects despite "filter" · a0c9016a
      Jonathan Tan authored
      A filter line in a request to upload-pack filters out objects regardless
      of whether they are directly referenced by a "want" line or not. This
      means that cloning with "--filter=blob:none" (or another filter that
      excludes blobs) from a repository with at least one ref pointing to a
      blob (for example, the Git repository itself) results in output like the
      following:
      
          error: missing object referenced by 'refs/tags/junio-gpg-pub'
      
      and if that particular blob is not referenced by a fetched tree, the
      resulting clone fails fsck because there is no object from the remote to
      vouch that the missing object is a promisor object.
      
      Update both the protocol and the upload-pack implementation to include
      all explicitly specified "want" objects in the packfile regardless of
      the filter specification.
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a0c9016a
  13. 21 May, 2018 1 commit
  14. 22 Feb, 2018 1 commit
    • Jeff King's avatar
      revision: drop --show-all option · f74bbc8d
      Jeff King authored
      This was an undocumented debugging aid that does not seem to
      have come in handy in the past decade, judging from its lack
      of mentions on the mailing list.
      
      Let's drop it in the name of simplicity. This is morally a
      revert of 3131b713 (Add "--show-all" revision walker flag
      for debugging, 2008-02-09), but note that I did leave in the
      mapping of UNINTERESTING to "^" in get_revision_mark(). I
      don't think this would be possible to trigger with the
      current code, but it's the only sensible marker.
      
      We'll skip the usual deprecation period because this was
      explicitly a debugging aid that was never documented.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f74bbc8d
  15. 28 Dec, 2017 1 commit
  16. 12 Dec, 2017 1 commit
  17. 08 Dec, 2017 1 commit
    • Jonathan Tan's avatar
      rev-list: support termination at promisor objects · df11e196
      Jonathan Tan authored
      Teach rev-list to support termination of an object traversal at any
      object from a promisor remote (whether one that the local repo also has,
      or one that the local repo knows about because it has another promisor
      object that references it).
      
      This will be used subsequently in gc and in the connectivity check used
      by fetch.
      
      For efficiency, if an object is referenced by a promisor object, and is
      in the local repo only as a non-promisor object, object traversal will
      not stop there. This is to avoid building the list of promisor object
      references.
      
      (In list-objects.c, the case where obj is NULL in process_blob() and
      process_tree() do not need to be changed because those happen only when
      there is a conflict between the expected type and the existing object.
      If the object doesn't exist, an object will be synthesized, which is
      fine.)
      Signed-off-by: default avatarJonathan Tan <jonathantanmy@google.com>
      Signed-off-by: default avatarJeff Hostetler <jeffhost@microsoft.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      df11e196
  18. 16 Nov, 2017 1 commit
    • Stefan Beller's avatar
      revision.h: introduce blob/tree walking in order of the commits · ce5b6f9b
      Stefan Beller authored
      The functionality to list tree objects in the order they were seen
      while traversing the commits will be used in one of the next commits,
      where we teach `git describe` to describe not only commits, but blobs, too.
      
      The change in list-objects.c is rather minimal as we'll be re-using
      the infrastructure put in place of the revision walking machinery. For
      example one could expect that add_pending_tree is not called, but rather
      commit->tree is directly passed to the tree traversal function. This
      however requires a lot more code than just emptying the queue containing
      trees after each commit.
      Signed-off-by: Stefan Beller's avatarStefan Beller <sbeller@google.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ce5b6f9b
  19. 24 Sep, 2017 1 commit
    • Martin Ågren's avatar
      leak_pending: use `object_array_clear()`, not `free()` · b2ccdf7f
      Martin Ågren authored
      Setting `leak_pending = 1` tells `prepare_revision_walk()` not to
      release the `pending` array, and makes that the caller's responsibility.
      See 4a43d374 (revision: add leak_pending flag, 2011-10-01) and
      353f5657 (bisect: use leak_pending flag, 2011-10-01).
      
      Commit 1da1e07c (clean up name allocation in prepare_revision_walk,
      2014-10-15) fixed a memory leak in `prepare_revision_walk()` by
      switching from `free()` to `object_array_clear()`. However, where we use
      the `leak_pending`-mechanism, we're still only calling `free()`.
      
      Use `object_array_clear()` instead. Copy some helpful comments from
      353f5657 to the other callers that we update to clarify the memory
      responsibilities, and to highlight that the commits are not affected
      when we clear the array -- it is indeed correct to both tidy up the
      commit flags and clear the object array.
      
      Document `leak_pending` in revision.h to help future users get this
      right.
      Signed-off-by: default avatarMartin Ågren <martin.agren@gmail.com>
      Reviewed-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      b2ccdf7f
  20. 24 Aug, 2017 1 commit
    • Duy Nguyen's avatar
      revision.h: new flag in struct rev_info wrt. worktree-related refs · ff9445be
      Duy Nguyen authored
      The revision walker can walk through per-worktree refs like HEAD or
      SHA-1 references in the index. These currently are from the current
      worktree only. This new flag is added to change rev-list behavior in
      this regard:
      
      When single_worktree is set, only current worktree is considered. When
      it is not set (which is the default), all worktrees are considered.
      
      The default is chosen so because the two big components that rev-list
      works with are object database (entirely shared between worktrees) and
      refs (mostly shared). It makes sense that default behavior goes per-repo
      too instead of per-worktree.
      
      The flag will eventually be exposed as a rev-list argument with
      documents. For now it stays internal until the new behavior is fully
      implemented.
      Signed-off-by: Duy Nguyen's avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      ff9445be
  21. 02 Aug, 2017 1 commit
    • Jeff King's avatar
      revision: add rev_input_given flag · 7ba82629
      Jeff King authored
      Normally a caller that invokes setup_revisions() has to
      check rev.pending to see if anything was actually queued for
      the traversal. But they can't tell the difference between
      two cases:
      
        1. The user gave us no tip from which to start a
           traversal.
      
        2. The user tried to give us tips via --glob, --all, etc,
           but their patterns ended up being empty.
      
      Let's set a flag in the rev_info struct that callers can use
      to tell the difference.  We can set this from the
      init_all_refs_cb() function.  That's a little funny because
      it's not exactly about initializing the "cb" struct itself.
      But that function is the common setup place for doing
      pattern traversals that is used by --glob, --all, etc.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7ba82629
  22. 12 Jun, 2017 1 commit
  23. 08 May, 2017 1 commit
  24. 27 Apr, 2017 1 commit
    • Johannes Schindelin's avatar
      timestamp_t: a new data type for timestamps · dddbad72
      Johannes Schindelin authored
      Git's source code assumes that unsigned long is at least as precise as
      time_t. Which is incorrect, and causes a lot of problems, in particular
      where unsigned long is only 32-bit (notably on Windows, even in 64-bit
      versions).
      
      So let's just use a more appropriate data type instead. In preparation
      for this, we introduce the new `timestamp_t` data type.
      
      By necessity, this is a very, very large patch, as it has to replace all
      timestamps' data type in one go.
      
      As we will use a data type that is not necessarily identical to `time_t`,
      we need to be very careful to use `time_t` whenever we interact with the
      system functions, and `timestamp_t` everywhere else.
      Signed-off-by: Johannes Schindelin's avatarJohannes Schindelin <johannes.schindelin@gmx.de>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      dddbad72
  25. 18 Mar, 2017 1 commit
  26. 30 Mar, 2016 2 commits
    • Junio C Hamano's avatar
      pretty: enable --expand-tabs by default for selected pretty formats · 0893eec8
      Junio C Hamano authored
      "git log --pretty={medium,full,fuller}" and "git log" by default
      prepend 4 spaces to the log message, so it makes sense to enable
      the new "expand-tabs" facility by default for these formats.
      Add --no-expand-tabs option to override the new default.
      
      The change alone breaks a test in t4201 that runs "git shortlog"
      on the output from "git log", and expects that the output from
      "git log" does not do such a tab expansion.  Adjust the test to
      explicitly disable expand-tabs with --no-expand-tabs.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      0893eec8
    • Linus Torvalds's avatar
      pretty: expand tabs in indented logs to make things line up properly · 7cc13c71
      Linus Torvalds authored
      A commit log message sometimes tries to line things up using tabs,
      assuming fixed-width font with the standard 8-place tab settings.
      Viewing such a commit however does not work well in "git log", as
      we indent the lines by prefixing 4 spaces in front of them.
      
      This should all line up:
      
        Column 1	Column 2
        --------	--------
        A		B
        ABCD		EFGH
        SPACES        Instead of Tabs
      
      Even with multi-byte UTF8 characters:
      
        Column 1	Column 2
        --------	--------
        Ä		B
        åäö		100
        A Møøse	once bit my sister..
      
      Tab-expand the lines in "git log --expand-tabs" output before
      prefixing 4 spaces.
      
      This is based on the patch by Linus Torvalds, but at this step, we
      require an explicit command line option to enable the behaviour.
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      7cc13c71
  27. 16 Mar, 2016 3 commits
    • Jeff King's avatar
      list-objects: pass full pathname to callbacks · 2824e184
      Jeff King authored
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      2824e184
    • Jeff King's avatar
      list-objects: drop name_path entirely · dc06dc88
      Jeff King authored
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      dc06dc88
    • Jeff King's avatar
      list-objects: convert name_path to a strbuf · f3badaed
      Jeff King authored
      The "struct name_path" data is examined in only two places:
      we generate it in process_tree(), and we convert it to a
      single string in path_name(). Everyone else just passes it
      through to those functions.
      
      We can further note that process_tree() already keeps a
      single strbuf with the leading tree path, for use with
      tree_entry_interesting().
      
      Instead of building a separate name_path linked list, let's
      just use the one we already build in "base". This reduces
      the amount of code (especially tricky code in path_name()
      which did not check for integer overflows caused by deep
      or large pathnames).
      
      It is also more efficient in some instances.  Any time we
      were using tree_entry_interesting, we were building up the
      strbuf anyway, so this is an immediate and obvious win
      there. In cases where we were not, we trade off storing
      "pathname/" in a strbuf on the heap for each level of the
      path, instead of two pointers and an int on the stack (with
      one pointer into the tree object). On a 64-bit system, the
      latter is 20 bytes; so if path components are less than that
      on average, this has lower peak memory usage.  In practice
      it probably doesn't matter either way; we are already
      holding in memory all of the tree objects leading up to each
      pathname, and for normal-depth pathnames, we are only
      talking about hundreds of bytes.
      
      This patch leaves "struct name_path" as a thin wrapper
      around the strbuf, to avoid disrupting callbacks. We should
      fix them, but leaving it out makes this diff easier to view.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      f3badaed
  28. 12 Feb, 2016 3 commits
    • Jeff King's avatar
      list-objects: pass full pathname to callbacks · de1e67d0
      Jeff King authored
      When we find a blob at "a/b/c", we currently pass this to
      our show_object_fn callbacks as two components: "a/b/" and
      "c". Callbacks which want the full value then call
      path_name(), which concatenates the two. But this is an
      inefficient interface; the path is a strbuf, and we could
      simply append "c" to it temporarily, then roll back the
      length, without creating a new copy.
      
      So we could improve this by teaching the callsites of
      path_name() this trick (and there are only 3). But we can
      also notice that no callback actually cares about the
      broken-down representation, and simply pass each callback
      the full path "a/b/c" as a string. The callback code becomes
      even simpler, then, as we do not have to worry about freeing
      an allocated buffer, nor rolling back our modification to
      the strbuf.
      
      This is theoretically less efficient, as some callbacks
      would not bother to format the final path component. But in
      practice this is not measurable. Since we use the same
      strbuf over and over, our work to grow it is amortized, and
      we really only pay to memcpy a few bytes.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      de1e67d0
    • Jeff King's avatar
      list-objects: drop name_path entirely · bd64516a
      Jeff King authored
      In the previous commit, we left name_path as a thin wrapper
      around a strbuf. This patch drops it entirely. As a result,
      every show_object_fn callback needs to be adjusted. However,
      none of their code needs to be changed at all, because the
      only use was to pass it to path_name(), which now handles
      the bare strbuf.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      bd64516a
    • Jeff King's avatar
      list-objects: convert name_path to a strbuf · 13528ab3
      Jeff King authored
      The "struct name_path" data is examined in only two places:
      we generate it in process_tree(), and we convert it to a
      single string in path_name(). Everyone else just passes it
      through to those functions.
      
      We can further note that process_tree() already keeps a
      single strbuf with the leading tree path, for use with
      tree_entry_interesting().
      
      Instead of building a separate name_path linked list, let's
      just use the one we already build in "base". This reduces
      the amount of code (especially tricky code in path_name()
      which did not check for integer overflows caused by deep
      or large pathnames).
      
      It is also more efficient in some instances.  Any time we
      were using tree_entry_interesting, we were building up the
      strbuf anyway, so this is an immediate and obvious win
      there. In cases where we were not, we trade off storing
      "pathname/" in a strbuf on the heap for each level of the
      path, instead of two pointers and an int on the stack (with
      one pointer into the tree object). On a 64-bit system, the
      latter is 20 bytes; so if path components are less than that
      on average, this has lower peak memory usage.  In practice
      it probably doesn't matter either way; we are already
      holding in memory all of the tree objects leading up to each
      pathname, and for normal-depth pathnames, we are only
      talking about hundreds of bytes.
      
      This patch leaves "struct name_path" as a thin wrapper
      around the strbuf, to avoid disrupting callbacks. We should
      fix them, but leaving it out makes this diff easier to view.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      13528ab3
  29. 15 Dec, 2015 1 commit
  30. 29 Jun, 2015 1 commit
    • Jeff King's avatar
      convert "enum date_mode" into a struct · a5481a6c
      Jeff King authored
      In preparation for adding date modes that may carry extra
      information beyond the mode itself, this patch converts the
      date_mode enum into a struct.
      
      Most of the conversion is fairly straightforward; we pass
      the struct as a pointer and dereference the type field where
      necessary. Locations that declare a date_mode can use a "{}"
      constructor.  However, the tricky case is where we use the
      enum labels as constants, like:
      
        show_date(t, tz, DATE_NORMAL);
      
      Ideally we could say:
      
        show_date(t, tz, &{ DATE_NORMAL });
      
      but of course C does not allow that. Likewise, we cannot
      cast the constant to a struct, because we need to pass an
      actual address. Our options are basically:
      
        1. Manually add a "struct date_mode d = { DATE_NORMAL }"
           definition to each caller, and pass "&d". This makes
           the callers uglier, because they sometimes do not even
           have their own scope (e.g., they are inside a switch
           statement).
      
        2. Provide a pre-made global "date_normal" struct that can
           be passed by address. We'd also need "date_rfc2822",
           "date_iso8601", and so forth. But at least the ugliness
           is defined in one place.
      
        3. Provide a wrapper that generates the correct struct on
           the fly. The big downside is that we end up pointing to
           a single global, which makes our wrapper non-reentrant.
           But show_date is already not reentrant, so it does not
           matter.
      
      This patch implements 3, along with a minor macro to keep
      the size of the callers sane.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      a5481a6c