1. 14 Nov, 2018 2 commits
  2. 02 Nov, 2018 1 commit
    • Derrick Stolee's avatar
      commit-reach: implement get_reachable_subset · fcb2c076
      Derrick Stolee authored
      The existing reachability algorithms in commit-reach.c focus on
      finding merge-bases or determining if all commits in a set X can
      reach at least one commit in a set Y. However, for two commits sets
      X and Y, we may also care about which commits in Y are reachable
      from at least one commit in X.
      
      Implement get_reachable_subset() which answers this question. Given
      two arrays of commits, 'from' and 'to', return a commit_list with
      every commit from the 'to' array that is reachable from at least
      one commit in the 'from' array.
      
      The algorithm is a simple walk starting at the 'from' commits, using
      the PARENT2 flag to indicate "this commit has already been added to
      the walk queue". By marking the 'to' commits with the PARENT1 flag,
      we can determine when we see a commit from the 'to' array. We remove
      the PARENT1 flag as we add that commit to the result list to avoid
      duplicates.
      
      The order of the resulting list is a reverse of the order that the
      commits are discovered in the walk.
      
      There are a couple shortcuts to avoid walking more than we need:
      
      1. We determine the minimum generation number of commits in the
         'to' array. We do not walk commits with generation number
         below this minimum.
      
      2. We count how many distinct commits are in the 'to' array, and
         decrement this count when we discover a 'to' commit during the
         walk. If this number reaches zero, then we can terminate the
         walk.
      
      Tests will be added using the 'test-tool reach' helper in a
      subsequent commit.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      fcb2c076
  3. 29 Oct, 2018 1 commit
    • Ramsay Jones's avatar
      commit-reach.h: add missing declarations (hdr-check) · 1406725b
      Ramsay Jones authored
      Add the necessary #includes and forward declarations to allow the header
      file to pass the 'hdr-check' target.
      
      Note that, since this header includes the commit-slab implementation
      header file (indirectly via commit-slab.h), some of the commit-slab
      inline functions (e.g contains_cache_at_peek()) will not compile without
      the complete type of 'struct commit'. Hence, we replace the forward
      declaration of 'struct commit' with the an #include of the 'commit.h'
      header file.
      
      It is possible, using the 'commit-slab-{decl,impl}.h' files, to avoid
      this inclusion of the 'commit.h' header. Commit a9f1f1f9 ("commit-slab.h:
      code split", 2018-05-19) separated the commit-slab interface from its
      implementation, to allow for the definition of a public commit-slab data
      structure. This enabled us to avoid including the commit-slab implementation
      in a header file, which could result in the replication of the commit-slab
      functions in each compilation unit in which it was included.
      
      Indeed, if you compile with optimizations disabled, then run this script:
      
        $ cat -n dup-static.sh
             1 #!/bin/sh
             2
             3 nm $1 | grep ' t ' | cut -d' ' -f3 | sort | uniq -c |
             4 	sort -rn | grep -v '      1'
        $
      
        $ ./dup-static.sh git | grep contains
             24 init_contains_cache_with_stride
             24 init_contains_cache
             24 contains_cache_peek
             24 contains_cache_at_peek
             24 contains_cache_at
             24 clear_contains_cache
        $
      
      you will find 24 copies of the commit-slab routines for the contains_cache.
      Of course, when you enable optimizations again, these duplicate static
      functions (mostly) disappear. Compiling with gcc at -O2, leaves two static
      functions, thus:
      
        $ nm commit-reach.o | grep contains_cache
        0000000000000870 t contains_cache_at_peek.isra.1.constprop.6
        $ nm ref-filter.o | grep contains_cache
        00000000000002b0 t clear_contains_cache.isra.14
        $
      
      However, using a shared 'contains_cache' would result in all six of the
      above functions as external public functions in the git binary. At present,
      only three of these functions are actually called, so the trade-off
      seems to favour letting the compiler inline the commit-slab functions.
      Signed-off-by: 's avatarRamsay Jones <ramsay@ramsayjones.plus.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      1406725b
  4. 18 Oct, 2018 1 commit
  5. 20 Jul, 2018 6 commits
    • Derrick Stolee's avatar
      commit-reach: make can_all_from_reach... linear · 4fbcca4e
      Derrick Stolee authored
      The can_all_from_reach_with_flags() algorithm is currently quadratic in
      the worst case, because it calls the reachable() method for every 'from'
      without tracking which commits have already been walked or which can
      already reach a commit in 'to'.
      
      Rewrite the algorithm to walk each commit a constant number of times.
      
      We also add some optimizations that should work for the main consumer of
      this method: fetch negotitation (haves/wants).
      
      The first step includes using a depth-first-search (DFS) from each
      'from' commit, sorted by ascending generation number. We do not walk
      beyond the minimum generation number or the minimum commit date. This
      DFS is likely to be faster than the existing reachable() method because
      we expect previous ref values to be along the first-parent history.
      
      If we find a target commit, then we mark everything in the DFS stack as
      a RESULT. This expands the set of targets for the other 'from' commits.
      We also mark the visited commits using 'assign_flag' to prevent re-
      walking the same commits.
      
      We still need to clear our flags at the end, which is why we will have a
      total of three visits to each commit.
      
      Performance was measured on the Linux repository using
      'test-tool reach can_all_from_reach'. The input included rows seeded by
      tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
      v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
      v3 releases and want all major v4 releases." The "large" case included
      X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
      tags to the set, which does not greatly increase the number of objects
      that are considered, but does increase the number of 'from' commits,
      demonstrating the quadratic nature of the previous code.
      
      Small Case:
      
      Before: 1.52 s
       After: 0.26 s
      
      Large Case:
      
      Before: 3.50 s
       After: 0.27 s
      
      Note how the time increases between the two cases in the two versions.
      The new code increases relative to the number of commits that need to be
      walked, but not directly relative to the number of 'from' commits.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      4fbcca4e
    • Derrick Stolee's avatar
      test-reach: test can_all_from_reach_with_flags · 1792bc12
      Derrick Stolee authored
      The can_all_from_reach_with_flags method is used by ok_to_give_up in
      upload-pack.c to see if we have done enough negotiation during a fetch.
      This method is intentionally created to preserve state between calls to
      assist with stateful negotiation, such as over SSH.
      
      To make this method testable, add a new can_all_from_reach method that
      does the initial setup and final tear-down. We will later use this
      method in production code. Call the method from 'test-tool reach' for
      now.
      
      Since this is a many-to-many reachability query, add a new type of input
      to the 'test-tool reach' input format. Lines "Y:<committish>" create a
      list of commits to be the reachability targets from the commits in the
      'X' list. In the context of fetch negotiation, the 'X' commits are the
      'want' commits and the 'Y' commits are the 'have' commits.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      1792bc12
    • Derrick Stolee's avatar
      commit-reach: move can_all_from_reach_with_flags · ba3ca1ed
      Derrick Stolee authored
      There are several commit walks in the codebase. Group them together into
      a new commit-reach.c file and corresponding header. After we group these
      walks into one place, we can reduce duplicate logic by calling
      equivalent methods.
      
      The can_all_from_reach_with_flags method is used in a stateful way by
      upload-pack.c. The parameters are very flexible, so we will be able to
      use its commit walking logic for many other callers.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      ba3ca1ed
    • Derrick Stolee's avatar
      commit-reach: move commit_contains from ref-filter · 920f93ca
      Derrick Stolee authored
      There are several commit walks in the codebase. Group them together into
      a new commit-reach.c file and corresponding header. After we group these
      walks into one place, we can reduce duplicate logic by calling
      equivalent methods.
      
      All methods are direct moves, except we also make the commit_contains()
      method public so its consumers in ref-filter.c can still call it. We can
      also test this method in a test-tool in a later commit.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      920f93ca
    • Derrick Stolee's avatar
      commit-reach: move ref_newer from remote.c · 1d614d41
      Derrick Stolee authored
      There are several commit walks in the codebase. Group them together into
      a new commit-reach.c file and corresponding header. After we group these
      walks into one place, we can reduce duplicate logic by calling
      equivalent methods.
      
      The ref_newer() method is used by 'git push -f' to check if a force-push
      is necessary. By making the method public, we make it possible to test
      the method directly without setting up an envieronment where a 'git
      push' call makes sense.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      1d614d41
    • Derrick Stolee's avatar
      commit-reach: move walk methods from commit.c · 5227c385
      Derrick Stolee authored
      There are several commit walks in the codebase. Group them together into
      a new commit-reach.c file and corresponding header. After we group these
      walks into one place, we can reduce duplicate logic by calling
      equivalent methods.
      
      The method declarations in commit.h are not touched by this commit and
      will be moved in a following commit. Many consumers need to point to
      commit-reach.h and that would bloat this commit.
      Signed-off-by: 's avatarDerrick Stolee <dstolee@microsoft.com>
      Signed-off-by: 's avatarJunio C Hamano <gitster@pobox.com>
      5227c385