[Graph Libraries] agwrite sometimes breaks UTF-8 strings
Ported Issue from Mantis Original ID: 2260 Reported By: camillo
SEVERITY: IMPORTANT Submitted: 2013-02-17 09:04:30
You can mostly use UTF-8 with Graphviz (and the Python module pygraphviz relies on this), but sometimes UTF-8 strings are broken when writing out a graph using agwrite. This happens because _agstrcanon attempts to insert a backslash and a LF when the current line is longer than 80 characters: if the break occurs in the middle of a multi-byte UTF-8 sequence, invalid output is produced.
STEPS TO REPRODUCE
Use pygraphviz to create a graph, give it a Unicode label containing a bunch of non-ASCII characters, then write out the graph. Pad the string if necessary to ensure that the 80-character point comes in the middle of a multibyte sequence.
I am attaching a patch that fixes this issue by ensuring that multi-byte UTF-8 sequences are not broken. There should be no impact on plain ASCII, and minimal on other encodings (e.g. ISO Latin-1, assuming it is even supported) as long as there are no huge blocks of bytes with values > 127. Even then, breaking long lines is mostly a cosmetic issue.