• Jeff King's avatar
    do not stream large files to pack when filters are in use · 4f22b101
    Jeff King authored
    Because git's object format requires us to specify the
    number of bytes in the object in its header, we must know
    the size before streaming a blob into the object database.
    This is not a problem when adding a regular file, as we can
    get the size from stat(). However, when filters are in use
    (such as autocrlf, or the ident, filter, or eol
    gitattributes), we have no idea what the ultimate size will
    The current code just punts on the whole issue and ignores
    filter configuration entirely for files larger than
    core.bigfilethreshold. This can generate confusing results
    if you use filters for large binary files, as the filter
    will suddenly stop working as the file goes over a certain
    size.  Rather than try to handle unknown input sizes with
    streaming, this patch just turns off the streaming
    optimization when filters are in use.
    This has a slight performance regression in a very specific
    case: if you have autocrlf on, but no gitattributes, a large
    binary file will avoid the streaming code path because we
    don't know beforehand whether it will need conversion or
    not. But if you are handling large binary files, you should
    be marking them as such via attributes (or at least not
    using autocrlf, and instead marking your text files as
    such). And the flip side is that if you have a large
    _non_-binary file, there is a correctness improvement;
    before we did not apply the conversion at all.
    The first half of the new t1051 script covers these failures
    on input. The second half tests the matching output code
    paths. These already work correctly, and do not need any
    Signed-off-by: default avatarJeff King <peff@peff.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
t1051-large-conversion.sh 1.86 KB