Some Unicode characters are broken (in two separately coloured pieces)
If I apply
highlight --out-format=xterm256 elise.txt (with or without
--encoding=utf-8) where the file is:
what I get as a result is the green text
By piping into less, it’s clear that the character “ü” has been broken into two pieces “<C3>” and “<BC>” each coloured separately:
The same command applied to the file
# Für Elise. Für Elise
(this is not a real command, but I do give names containing non-ASCII characters to scripts), the result is (the comment in light blue, the command in green):
# Für Elise.
By piping into less, I see that the comment is left alone, while in the command, the character “ü” is again split into two parts.
I suppose these are not serious problems, because I probably shouldn’t use non-ASCII characters to name scripts and highlight is not meant to “highlight” plain text, but for a language like Agda, where it’s normal to use Unicode characters everywhere, all Unicode characters are broken in such a way and the resulting highlighted text is unreadable.