Skip to content
  • Duy Nguyen's avatar
    pack-objects: fix performance issues on packing large deltas · 9ac3f0e5
    Duy Nguyen authored and Junio C Hamano's avatar Junio C Hamano committed
    Let's start with some background about oe_delta_size() and
    oe_set_delta_size(). If you already know, skip the next paragraph.
    
    These two are added in 0aca34e8
    
     (pack-objects: shrink delta_size
    field in struct object_entry - 2018-04-14) to help reduce 'struct
    object_entry' size. The delta size field in this struct is reduced to
    only contain max 1MB. So if any new delta is produced and larger than
    1MB, it's dropped because we can't really save such a large size
    anywhere. Fallback is provided in case existing packfiles already have
    large deltas, then we can retrieve it from the pack.
    
    While this should help small machines repacking large repos without
    large deltas (i.e. less memory pressure), dropping large deltas during
    the delta selection process could end up with worse pack files. And if
    existing packfiles already have >1MB delta and pack-objects is
    instructed to not reuse deltas, all of them will be dropped on the
    floor, and the resulting pack would be definitely bigger.
    
    There is also a regression in terms of CPU/IO if we have large on-disk
    deltas because fallback code needs to parse the pack every time the
    delta size is needed and just access to the mmap'd pack data is enough
    for extra page faults when memory is under pressure.
    
    Both of these issues were reported on the mailing list. Here's some
    numbers for comparison.
    
        Version  Pack (MB)  MaxRSS(kB)  Time (s)
        -------  ---------  ----------  --------
         2.17.0     5498     43513628    2494.85
         2.18.0    10531     40449596    4168.94
    
    This patch provides a better fallback that is
    
    - cheaper in terms of cpu and io because we won't have to read
      existing pack files as much
    
    - better in terms of pack size because the pack heuristics is back to
      2.17.0 time, we do not drop large deltas at all
    
    If we encounter any delta (on-disk or created during try_delta phase)
    that is larger than the 1MB limit, we stop using delta_size_ field for
    this because it can't contain such size anyway. A new array of delta
    size is dynamically allocated and can hold all the deltas that 2.17.0
    can. This array only contains delta sizes that delta_size_ can't
    contain.
    
    With this, we do not have to drop deltas in try_delta() anymore. Of
    course the downside is we use slightly more memory, even compared to
    2.17.0. But since this is considered an uncommon case, a bit more
    memory consumption should not be a problem.
    
    Delta size limit is also raised from 1MB to 16MB to better cover
    common case and avoid that extra memory consumption (99.999% deltas in
    this reported repo are under 12MB; Jeff noted binary artifacts topped
    out at about 3MB in some other private repos). Other fields are
    shuffled around to keep this struct packed tight. We don't use more
    memory in common case even with this limit update.
    
    A note about thread synchronization. Since this code can be run in
    parallel during delta searching phase, we need a mutex. The realloc
    part in packlist_alloc() is not protected because it only happens
    during the object counting phase, which is always single-threaded.
    
    Access to e->delta_size_ (and by extension
    pack->delta_size[e - pack->objects]) is unprotected as before, the
    thread scheduler in pack-objects must make sure "e" is never updated
    by two different threads.
    
    The area under the new lock is as small as possible, avoiding locking
    at all in common case, since lock contention with high thread count
    could be expensive (most blobs are small enough that delta compute
    time is short and we end up taking the lock very often). The previous
    attempt to always hold a lock in oe_delta_size() and
    oe_set_delta_size() increases execution time by 33% when repacking
    linux.git with with 40 threads.
    
    Reported-by: default avatarElijah Newren <newren@gmail.com>
    Helped-by: default avatarElijah Newren <newren@gmail.com>
    Helped-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarNguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    9ac3f0e5