Syntax Tests: Caret misaligned with multi-byte chars in UTF-8

While testing with the new Hugo syntax I encountered a stray behaviour in tests where there are multibyte encoded characters. Basically, the test caret starts to disalign with the above line when a multi-byte char is encountered.

My guess is that the test code always assumes that each character in the above line maps to a single byte, and keeps counting columns without accounting with multiple bytes in UTF-8 files.

If you look at this test file:

https://gitlab.com/tajmone/highlight-test-suite/blob/master/hugo/syntax_test_strings.hug#L134

You'll notice that I had to limit the actual carets when dealing with the acute accent ´ because I was getting error reports mentioning columns where there wasn't actually a caret for testing.

This acute accent business is an edge case, and I had to struggle a bit to cover matching it in escape sequences (defined as Interpolation) because the character could be a single byte in ISO-8859-1 or two-bytes in UTF-8.

So I had to create an ASCII test file too for this:

https://gitlab.com/tajmone/highlight-test-suite/blob/master/hugo/syntax_test_interpolation-ascii.hug

In the syntax definition, I had to cover both the ASCII version of the accent as well as the UTF-8 version, because although Hugo sources are usually in ISO-8859-1, inside Asciidoctor documentation project they'll be either pasted inside UTF-8 documents, or included externally as UTF-8 converted files (because Asciidoctor doesn't support ISO encoded files).

--[[
NOTE: The RegEx below defines twice the acute accent (´) char because depending
      on wether the source is in ASCII/ISO-8859-1 or UTF-8 its encoding will
      differ (the former is the expected encoding for Hugo sourceS, but the
      latter might be encountered in documentation projects).               --]]
  Interpolation = [=[ (?x)(\\(?:
    \xC2\xB4[a-zA-Z]  | # Acute accent (´) in UTF-8 docs will be $c2 $b4.
    [`´~\^:][a-zA-Z]  | # Note: acute accent in ASCII format also found here.

Edited May 24, 2019 by Tristano Ajmone

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information