Unicode support
The parser and the whole compiler has to support Unicode.
There seems to be a very fast implementation of Unicode handling here:
https://github.com/hoehrmann/utf-8-misc
The documentation is here: https://web.archive.org/web/20190528032751/http://bjoern.hoehrmann.de/utf-8/decoder/dfa/