Skip to content
  • Karsten Blees's avatar
    Documentation/i18n.txt: clarify character encoding support · 3a59e595
    Karsten Blees authored and Junio C Hamano's avatar Junio C Hamano committed
    
    
    As a "distributed" VCS, git should better define the encodings of its core
    textual data structures, in particular those that are part of the network
    protocol.
    
    That git is encoding agnostic is only really true for blob objects. E.g.
    the 'non-NUL bytes' requirement of tree and commit objects excludes
    UTF-16/32, and the special meaning of '/' in the index file as well as
    space and linefeed in commit objects eliminates EBCDIC and other non-ASCII
    encodings.
    
    Git expects bytes < 0x80 to be pure ASCII, thus CJK encodings that partly
    overlap with the ASCII range are problematic as well. E.g. fmt_ident()
    removes trailing 0x5C from user names on the assumption that it is ASCII
    '\'. However, there are over 200 GBK double byte codes that end in 0x5C.
    
    UTF-8 as default encoding on Linux and respective path translations in the
    Mac and Windows versions have established UTF-8 NFC as de-facto standard
    for path names.
    
    Update the documentation in i18n.txt to reflect the current status-quo.
    
    Signed-off-by: default avatarKarsten Blees <blees@dcon.de>
    Signed-off-by: default avatarJunio C Hamano <gitster@pobox.com>
    3a59e595