Skip to content
  • Torsten Bögershausen's avatar
    Support working-tree-encoding "UTF-16LE-BOM" · aab2a1ae
    Torsten Bögershausen authored and Junio C Hamano's avatar Junio C Hamano committed
    Users who want UTF-16 files in the working tree set the .gitattributes
    like this:
    test.txt working-tree-encoding=UTF-16
    
    The unicode standard itself defines 3 allowed ways how to encode UTF-16.
    The following 3 versions convert all back to 'g' 'i' 't' in UTF-8:
    
    a) UTF-16, without BOM, big endian:
    $ printf "\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
    0000000    g   i   t
    
    b) UTF-16, with BOM, little endian:
    $ printf "\377\376g\000i\000t\000" | iconv -f UTF-16 -t UTF-8 | od -c
    0000000    g   i   t
    
    c) UTF-16, with BOM, big endian:
    $ printf "\376\377\000g\000i\000t" | iconv -f UTF-16 -t UTF-8 | od -c
    0000000    g   i   t
    
    Git uses libiconv to convert from UTF-8 in the index into ITF-16 in the
    working tree.
    After a checkout, the resulting file has a BOM and is encoded in "UTF-16",
    in the version (c) above.
    This is what iconv generates, more details follow below.
    
    iconv (and libiconv) can generate UTF-16, UTF-16LE or UTF-16BE:
    
    d) UTF-16
    $ printf 'git' | ico...
    aab2a1ae