Skip to content

NFD representation unicode is preferred for input / output of filename on the disk

Detailed steps to reproduce the problem:

  1. create a file named ぐ.txt
  2. execute ls ぐ.txt
  3. copy the filename returned by ls to some text editor which can show difference between NFC and NFD representation (I use sublime text)

What happened:

The copied filename is encoded in NFC representation.

What should have happened:

In fact all unicode filename in macOS is encoded in NFD representation unicode. So actually no NFC named file exists on disk (macOS will convert all the filename to NFD before writes it to disk, if the input is NFC). So ls should return the actual code, not only the actual character, in iTerm2. Executing ls in terminal.app returns the NFD encoding character.

Also, another feature is wanted here, if possible. In my view terminal apps should whenever convert local unicode filename input to NFD representation before pass it to shell, as file with NFC representation filename doesn't actually exists on HFS+ disk. As almost the rest whole world and input method use NFC representation, I just cannot type the filename in NFD unicode. As all unicode filenames in macOS are encoded in NFD representation, could iTerm2 handle all the filename in this way and do proper conversion when needed?

Shame for Apple's bad decision.

For NFD unicode issue in macOS you can see

https://github.com/git/git/commit/76759c7dff53e8c84e975b88cb8245587c14c7ba

https://github.com/mpv-player/mpv/issues/4016