• Linus Torvalds's avatar
    pack-objects: improve path grouping heuristics. · ce0bd642
    Linus Torvalds authored
    This trivial patch not only simplifies the name hashing, it actually
    improves packing for both git and the kernel.
    
    The git archive pack shrinks from 6824090->6622627 bytes (a 3%
    improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
    mere 0.5% improvement, but still, it's an improvement from making the
    hashing much simpler!)
    
    We just create a 32-bit hash, where we "age" previous characters by two
    bits, so the last characters in a filename count most. So when we then
    compare the hashes in the sort routine, filenames that end the same way
    sort the same way.
    
    It takes the subdirectory into account (unless the filename is > 16
    characters), but files with the same name within the same subdirectory
    will obviously sort closer than files in different subdirectories.
    
    And, incidentally (which is why I tried the hash change in the first
    place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.
    
    And no, it's not a "good hash" in the sense of being secure or unique, but
    that's not what we're looking for. The whole "hash" thing is misnamed
    here. It's not so much a hash as a "sorting number".
    
    [jc: rolled in simplification for computing the sorting number
     computation for thin pack base objects]
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    Signed-off-by: default avatarJunio C Hamano <junkio@cox.net>
    ce0bd642
pack-objects.c 32.7 KB