Skip to content

Syntax Tests: Caret misaligned with multi-byte chars in UTF-8

While testing with the new Hugo syntax I encountered a stray behaviour in tests where there are multibyte encoded characters. Basically, the test caret starts to disalign with the above line when a multi-byte char is encountered.

My guess is that the test code always assumes that each character in the above line maps to a single byte, and keeps counting columns without accounting with multiple bytes in UTF-8 files.

If you look at this test file:

https://gitlab.com/tajmone/highlight-test-suite/blob/master/hugo/syntax_test_strings.hug#L134

You'll notice that I had to limit the actual carets when dealing with the acute accent ´ because I was getting error reports mentioning columns where there wasn't actually a caret for testing.

This acute accent business is an edge case, and I had to struggle a bit to cover matching it in escape sequences (defined as Interpolation) because the character could be a single byte in ISO-8859-1 or two-bytes in UTF-8.

So I had to create an ASCII test file too for this:

https://gitlab.com/tajmone/highlight-test-suite/blob/master/hugo/syntax_test_interpolation-ascii.hug

In the syntax definition, I had to cover both the ASCII version of the accent as well as the UTF-8 version, because although Hugo sources are usually in ISO-8859-1, inside Asciidoctor documentation project they'll be either pasted inside UTF-8 documents, or included externally as UTF-8 converted files (because Asciidoctor doesn't support ISO encoded files).

--[[
NOTE: The RegEx below defines twice the acute accent (´) char because depending
      on wether the source is in ASCII/ISO-8859-1 or UTF-8 its encoding will
      differ (the former is the expected encoding for Hugo sourceS, but the
      latter might be encountered in documentation projects).               --]]
  Interpolation = [=[ (?x)(\\(?:
    \xC2\xB4[a-zA-Z]  | # Acute accent (´) in UTF-8 docs will be $c2 $b4.
    [`´~\^:][a-zA-Z]  | # Note: acute accent in ASCII format also found here.
Edited by Tristano Ajmone
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information