• brian m. carlson's avatar
    utf8: handle systems that don't write BOM for UTF-16 · 79444c92
    brian m. carlson authored
    When serializing UTF-16 (and UTF-32), there are three possible ways to
    write the stream. One can write the data with a BOM in either big-endian
    or little-endian format, or one can write the data without a BOM in
    big-endian format.
    
    Most systems' iconv implementations choose to write it with a BOM in
    some endianness, since this is the most foolproof, and it is resistant
    to misinterpretation on Windows, where UTF-16 and the little-endian
    serialization are very common. For compatibility with Windows and to
    avoid accidental misuse there, Git always wants to write UTF-16 with a
    BOM, and will refuse to read UTF-16 without it.
    
    However, musl's iconv implementation writes UTF-16 without a BOM,
    relying on the user to interpret it as big-endian. This causes t0028 and
    the related functionality to fail, since Git won't read the file without
    a BOM.
    
    Add a Makefile and #define knob, ICONV_OMITS_BOM, that can be set if the
    iconv implementation has this behavior. When set, Git will write a BOM
    manually for UTF-16 and UTF-32 and then force the data to be written in
    UTF-16BE or UTF-32BE. We choose big-endian behavior here because the
    tests use the raw "UTF-16" encoding, which will be big-endian when the
    implementation requires this knob to be set.
    
    Update the tests to detect this case and write test data with an added
    BOM if necessary. Always write the BOM in the tests in big-endian
    format, since all iconv implementations that omit a BOM must use
    big-endian serialization according to the Unicode standard.
    
    Preserve the existing behavior for systems which do not have this knob
    enabled, since they may use optimized implementations, including
    defaulting to the native endianness, which may improve performance.
    Signed-off-by: brian m. carlson's avatarbrian m. carlson <sandals@crustytoothpaste.net>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    79444c92
utf8.c 20 KB