Skip to content
  • Derrick Stolee's avatar
    midx: sort and deduplicate objects from packfiles · fe1ed56f
    Derrick Stolee authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    Before writing a list of objects and their offsets to a multi-pack-index,
    we need to collect the list of objects contained in the packfiles. There
    may be multiple copies of some objects, so this list must be deduplicated.
    
    It is possible to artificially get into a state where there are many
    duplicate copies of objects. That can create high memory pressure if we
    are to create a list of all objects before de-duplication. To reduce
    this memory pressure without a significant performance drop,
    automatically group objects by the first byte of their object id. Use
    the IDX fanout tables to group the data, copy to a local array, then
    sort.
    
    Copy only the de-duplicated entries. Select the duplicate based on the
    most-recent modified time of a packfile containing the object.
    
    Signed-off-by: default avatarDerrick Stolee <dstolee@microsoft.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    fe1ed56f