Skip to content

UTF-8 codes up to 4 bytes

Created by: pixelglow

Graphviz UTF-8 currently only allows up to a 3-byte representation, covering code points U+0000 to U+FFFF. So e.g. emoji characters in labels are interpreted wrongly as Latin-1.

However, UTF-8 actually allows up to a 4-byte representation, covering code points U+0000 to U+10FFFF. I've implemented the wider representation and refactored away the hard-coded number of extra bytes.

This fixes Instaviz issue 310.

Merge request reports