Backspace on UTF-8 chars removes a single byte only
By gnach... on September 06, 2010 06:12 (imported from Google Code)
Launch the "cat" command. Type an accented letter a couple of types, let's say, type the letter ö 10 times. Then press backspace a couple of times (let's say twice). Then press Enter.
The first line will contain 10-2 = 8 ö letters. The second line, as echoed back by cat, will contain 9 ö's.
The reason is that while visually the backspace removes a whole character from the screen, sending a backspace to the kernel's tty driver (the code that handles line editing when in cooked mode) only removes a single byte from there. So the cat command did actually receive 9 ö letters. And of course you might easily end up with a truncated utf8 sequence too.
In order for backspace to work correctly, the kernel needs to know that the terminal is working in utf8 mode. This can be done by issuing the "stty iutf8" command, or verified by running "stty" on its own. While one can do it from his .bashrc as a workaround, this is not the way to go (e.g. it should also work when a non-shell command is being executed by iTerm.)
iTerm itself should set this terminal flag for each terminal it allocates, if the charset is utf-8. You'd need to pass the IUTF8 flag to the c_iflag member of a tcsetattr() call. This should hardly be more than two lines of code. You can see most other linux terminal emulators (e.g. vte) doing this, too; and apparently Mac's default Terminal also sets this mode correctly.
iTerm 0.10; Mac 10.5.8 with UTF-8 locale of course. Dupe of bug 2585211; that one is closed due to insufficient information and I cannot reopen.