Inconsistency Between Test and HTML Rendering

Ciao André,

I've added a dev branch to my Highlight tests suite for testing changes to the Nim language:

https://gitlab.com/tajmone/highlight-test-suite/tree/nim-lang-dev

There are a couple of tests which report errors that are inconsistent with the HTML highlighted results. In the following test:

var
  bad_h1 = 0xDEADBEEF'f128
#          ^^^^^^^^^^^^ num
#                      ^^^ std

The 128 at the end of the line is rendered as normal text in the final HTML doc:

<span class="hl kwa">var</span>
  bad_h1 <span class="hl opt">=</span> <span class="hl num">0xDEADBEEF&apos;f</span>128
<span class="hl slc">#          ^^^^^^^^^^^^ num</span>
<span class="hl slc">#                      ^^^ std</span>

But the test logs an error for it sees the 128 as a num instead:

.\syntax_test_numerals.nim line 147, column 23: got num instead of std

There is a discrepancy between how Highlight tester sees that token and how it's actually highlighted in HTML.

This is a bit of an edge case too, for technically the 128 IS just another number following the previous Hex number (where the 'f is a valid suffix, seen as part of the Hex num), but usually there should be a separating space between the two for the compiler to parse correctly the source. Indeed, that line is not well formed Nim code, but here I was more interested in understanding why the internal syntax tester and the highlight render seem out of phase regarding the state of the leftover part of the numeral RegEx.

According to the RegEx that defines Digits in Nim langDef, numerical constants should occur at boundries positions:

RE_suffx_int = [[(?:'i(?:8|16|32|64)|'u(?:8|16|32|64)?)?]]
RE_suffx_flt = [[(?:'f(?:32|64)?|'d)?]]
RE_exponent  = [[(?:e[\+\-]?\d[\d_]*)]]

Digits=[[(?xi)

  # ========== HEX / HEX FLOAT ================
  
    \b0x[\da-f][\d_a-f]*\b]]..RE_suffx_flt..[[

  # ========== OCTAL / OCTAL FLOAT ============
  
  | \b0(?-i:o)[0-7][0-7_]*\b]]..RE_suffx_flt..[[

  # ========== BINARY / BINARY FLOAT ==========
  
  | \b0b[01][01_]*\b]]..RE_suffx_flt..[[

  # ========== FLOATS =========================

  | \b\d[\d_]*(?:\.\d[\d_]*]]..RE_exponent..[[?|]]..RE_exponent..[[)]]..RE_suffx_flt..[[

  # ========== DECIMAL ========================

  | \b\d[\d_]*\b]]..RE_suffx_int..[[
  ]]

So, it seems quite reasonable that the test is reporting the 128 as a number. The question is: Why in the final HTML it's shown as normal text then? It seems that the rendering fails because there is no separating space between the two numbers (which makes sense too) — as if the 128 was being simply rejected by the highlighter, while the tester reparses it and sees it as a number; or it could be that the tester and renderer revert to different internal states when dealing with RegEx leftovers.

Whatever the cause of the problem might be (a fault in the RegEx of digits, etc.), shouldn't the tests and the output still be consistent in how elements are tokenized?

As for the RegEx, I often end up having to do a lot of tweaks to make things work, especially ensuring that all groups are not capturing (by adding the ?: in all round-bracketed groups), which I've never quite understood why it's necessary to do in a syntax definition — it seems that capturing groups get lost in the parser somewhow. Also, using the \b is often tricky, as it can have huge impact on elements definitions.

But here the RegEx seems fine.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information