Skip to content
  • Derrick Stolee's avatar
    midx: add packs to packed_git linked list · af96fe33
    Derrick Stolee authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    The multi-pack-index allows searching for objects across multiple
    packs using one object list. The original design gains many of
    these performance benefits by keeping the packs in the
    multi-pack-index out of the packed_git list.
    
    Unfortunately, this has one major drawback. If the multi-pack-index
    covers thousands of packs, and a command loads many of those packs,
    then we can hit the limit for open file descriptors. The
    close_one_pack() method is used to limit this resource, but it
    only looks at the packed_git list, and uses an LRU cache to prevent
    thrashing.
    
    Instead of complicating this close_one_pack() logic to include
    direct references to the multi-pack-index, simply add the packs
    opened by the multi-pack-index to the packed_git list. This
    immediately solves the file-descriptor limit problem, but requires
    some extra steps to avoid performance issues or other problems:
    
    1. Create a multi_pack_index bit in the packed_git struct that is
       one if and only if the pack was loaded from a multi-pack-index.
    
    2. Skip packs with the multi_pack_index bit when doing object
       lookups and abbreviations. These algorithms already check the
       multi-pack-index before the packed_git struct. This has a very
       small performance hit, as we need to walk more packed_git
       structs. This is acceptable, since these operations run binary
       search on the other packs, so this walk-and-ignore logic is
       very fast by comparison.
    
    3. When closing a multi-pack-index file, do not close its packs,
       as those packs will be closed using close_all_packs(). In some
       cases, such as 'git repack', we run 'close_midx()' without also
       closing the packs, so we need to un-set the multi_pack_index bit
       in those packs. This is necessary, and caught by running
       t6501-freshen-objects.sh with GIT_TEST_MULTI_PACK_INDEX=1.
    
    To manually test this change, I inserted trace2 logging into
    close_pack_fd() and set pack_max_fds to 10, then ran 'git rev-list
    --all --objects' on a copy of the Git repo with 300+ pack-files and
    a multi-pack-index. The logs verified the packs are closed as
    we read them beyond the file descriptor limit.
    
    Signed-off-by: default avatarDerrick Stolee <dstolee@microsoft.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    af96fe33