This project is mirrored from https://github.com/git/git. Updated .
  1. 27 May, 2014 1 commit
  2. 16 Jan, 2014 1 commit
    • Jeff King's avatar
      do not discard revindex when re-preparing packfiles · 1a6d8b91
      Jeff King authored
      When an object lookup fails, we re-read the objects/pack
      directory to pick up any new packfiles that may have been
      created since our last read. We also discard any pack
      revindex structs we've allocated.
      
      The discarding is a problem for the pack-bitmap code, which keeps
      a pointer to the revindex for the bitmapped pack. After the
      discard, the pointer is invalid, and we may read free()d
      memory.
      
      Other revindex users do not keep a bare pointer to the
      revindex; instead, they always access it through
      revindex_for_pack(), which lazily builds the revindex. So
      one solution is to teach the pack-bitmap code a similar
      trick. It would be slightly less efficient, but probably not
      all that noticeable.
      
      However, it turns out this discarding is not actually
      necessary. When we call reprepare_packed_git, we do not
      throw away our old pack list. We keep the existing entries,
      and only add in new ones. So there is no safety problem; we
      will still have the pack struct that matches each revindex.
      The packfile itself may go away, of course, but we are
      already prepared to handle that, and it may happen outside
      of reprepare_packed_git anyway.
      
      Throwing away the revindex may save some RAM if the pack
      never gets reused (about 12 bytes per object). But it also
      wastes some CPU time (to regenerate the index) if the pack
      does get reused. It's hard to say which is more valuable,
      but in either case, it happens very rarely (only when we
      race with a simultaneous repack). Just leaving the revindex
      in place is simple and safe both for current and future
      code.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      1a6d8b91
  3. 24 Oct, 2013 1 commit
  4. 12 Jul, 2013 2 commits
    • Jeff King's avatar
      pack-revindex: radix-sort the revindex · 8b8dfd51
      Jeff King authored
      The pack revindex stores the offsets of the objects in the
      pack in sorted order, allowing us to easily find the on-disk
      size of each object. To compute it, we populate an array
      with the offsets from the sha1-sorted idx file, and then use
      qsort to order it by offsets.
      
      That does O(n log n) offset comparisons, and profiling shows
      that we spend most of our time in cmp_offset. However, since
      we are sorting on a simple off_t, we can use numeric sorts
      that perform better. A radix sort can run in O(k*n), where k
      is the number of "digits" in our number. For a 64-bit off_t,
      using 16-bit "digits" gives us k=4.
      
      On the linux.git repo, with about 3M objects to sort, this
      yields a 400% speedup. Here are the best-of-five numbers for
      running
      
        echo HEAD | git cat-file --batch-check="%(objectsize:disk)
      
      on a fully packed repository, which is dominated by time
      spent building the pack revindex:
      
                before     after
        real    0m0.834s   0m0.204s
        user    0m0.788s   0m0.164s
        sys     0m0.040s   0m0.036s
      
      This matches our algorithmic expectations. log(3M) is ~21.5,
      so a traditional sort is ~21.5n. Our radix sort runs in k*n,
      where k is the number of radix digits. In the worst case,
      this is k=4 for a 64-bit off_t, but we can quit early when
      the largest value to be sorted is smaller. For any
      repository under 4G, k=2. Our algorithm makes two passes
      over the list per radix digit, so we end up with 4n. That
      should yield ~5.3x speedup. We see 4x here; the difference
      is probably due to the extra bucket book-keeping the radix
      sort has to do.
      
      On a smaller repo, the difference is less impressive, as
      log(n) is smaller. For git.git, with 173K objects (but still
      k=2), we see a 2.7x improvement:
      
                before     after
        real    0m0.046s   0m0.017s
        user    0m0.036s   0m0.012s
        sys     0m0.008s   0m0.000s
      
      On even tinier repos (e.g., a few hundred objects), the
      speedup goes away entirely, as the small advantage of the
      radix sort gets erased by the book-keeping costs (and at
      those sizes, the cost to generate the the rev-index gets
      lost in the noise anyway).
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Reviewed-by: default avatarBrandon Casey <drafnel@gmail.com>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      8b8dfd51
    • Jeff King's avatar
      pack-revindex: use unsigned to store number of objects · 012b32bb
      Jeff King authored
      A packfile may have up to 2^32-1 objects in it, so the
      "right" data type to use is uint32_t. We currently use a
      signed int, which means that we may behave incorrectly for
      packfiles with more than 2^31-1 objects on 32-bit systems.
      
      Nobody has noticed because having 2^31 objects is pretty
      insane. The linux.git repo has on the order of 2^22 objects,
      which is hundreds of times smaller than necessary to trigger
      the bug.
      
      Let's bump this up to an "unsigned". On 32-bit systems, this
      gives us the correct data-type, and on 64-bit systems, it is
      probably more efficient to use the native "unsigned" than a
      true uint32_t.
      
      While we're at it, we can fix the binary search not to
      overflow in such a case if our unsigned is 32 bits.
      Signed-off-by: default avatarJeff King <peff@peff.net>
      Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
      012b32bb
  5. 23 Jul, 2009 1 commit
  6. 02 Nov, 2008 1 commit
  7. 23 Aug, 2008 1 commit
  8. 24 Jun, 2008 1 commit
  9. 01 Mar, 2008 1 commit