Skip to content
  • Jeff King's avatar
    fetch-pack: avoid quadratic behavior in rev_list_push · 099327b5
    Jeff King authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    When we call find_common to start finding common ancestors
    with the remote side of a fetch, the first thing we do is
    insert the tip of each ref into our rev_list linked list. We
    keep the list sorted the whole time with
    commit_list_insert_by_date, which means our insertion ends
    up doing O(n^2) timestamp comparisons.
    
    We could teach rev_list_push to use an unsorted list, and
    then sort it once after we have added each ref. However, in
    get_rev, we process the list by popping commits off the
    front and adding parents back in timestamp-sorted order. So
    that procedure would still operate on the large list.
    
    Instead, we can replace the linked list with a heap-based
    priority queue, which can do O(log n) insertion, making the
    whole insertion procedure O(n log n).
    
    As a result of switching to the prio_queue struct, we fix
    two minor bugs:
    
      1. When we "pop" a commit in get_rev, and when we clear
         the rev_list in find_common, we do not take care to
         free the "struct commit_list", and just leak its
         memory. With the prio_queue implementation, the memory
         management is handled for us.
    
      2. In get_rev, we look at the head commit of the list,
         possibly push its parents onto the list, and then "pop"
         the front of the list off, assuming it is the same
         element that we just peeked at. This is typically going
         to be the case, but would not be in the face of clock
         skew: the parents are inserted by date, and could
         potentially be inserted at the head of the list if they
         have a timestamp newer than their descendent. In this
         case, we would accidentally pop the parent, and never
         process it at all.
    
         The new implementation pulls the commit off of the
         queue as we examine it, and so does not suffer from
         this problem.
    
    With this patch, a fetch of a single commit into a
    repository with 50,000 refs went from:
    
      real    0m7.984s
      user    0m7.852s
      sys     0m0.120s
    
    to:
    
      real    0m2.017s
      user    0m1.884s
      sys     0m0.124s
    
    Before this patch, a larger case with 370K refs still had
    not completed after tens of minutes; with this patch, it
    completes in about 12 seconds.
    
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    099327b5