Some Unicode characters are broken (in two separately coloured pieces)
If I apply highlight --out-format=xterm256 elise.txt
(with or without --encoding=utf-8
) where the file is:
Für Elise.
what I get as a result is the green text
Für Elise.
By piping into less, it’s clear that the character “ü” has been broken into two pieces “<C3>” and “<BC>” each coloured separately:
ESC[38;5;28mFESC[38;5;28m<C3>ESC[38;5;28m<BC>ESC[38;5;28mrESC[38;5;28m EliseESC[38;5;28m.ESC[m
The same command applied to the file elise.sh
:
# Für Elise.
Für Elise
(this is not a real command, but I do give names containing non-ASCII characters to scripts), the result is (the comment in light blue, the command in green):
# Für Elise.
Für Elise
By piping into less, I see that the comment is left alone, while in the command, the character “ü” is again split into two parts.
I suppose these are not serious problems, because I probably shouldn’t use non-ASCII characters to name scripts and highlight is not meant to “highlight” plain text, but for a language like Agda, where it’s normal to use Unicode characters everywhere, all Unicode characters are broken in such a way and the resulting highlighted text is unreadable.