Extend Strings and Interpolation Options in LangDefs
Currently escape sequences and interpolations tend to show up also in filepath strings, including preprocessor strings (the latter).
Preventing Escapes Outside Strings
To prevent escape sequences outside string one needs to use OnStateChange()
hacks like this:
function OnStateChange(oldState, newState, token, kwgroup)
--============================================================================
-- #01 -- Ignore Escape Sequences Outside Strings
--============================================================================
if newState == HL_ESC_SEQ and -- An escape seq. must follow either:
oldState ~= HL_STRING and -- * a string
oldState ~= HL_ESC_SEQ and -- * an escape sequence
oldState ~= HL_INTERPOLATION then -- * an interpolation
return HL_REJECT -- otherwise, reject it.
end
return newState
end
Preventing Interpolations in Preprocessor Strings
In these days, for the Hugo syntax I had to suppress interpolations that relied on the \
delimiter (i.e. escape sequences for special characters) from showing up in preprocessor strings by adding a further OnStateChange()
hack:
function OnStateChange(oldState, newState, token, kwgroup)
--============================================================================
-- #02 -- Ignore Interpolations Inside Preprocessor Strings
--============================================================================
if
newState == HL_INTERPOLATION and
oldState == HL_PREPROC_STRING then
return HL_REJECT
end
return newState
end
See Hugo syntax development and tests on:
Difficulties in Tracking States
In some syntaxes I had to use Interplation
to handle some escape-like sequences for special characters (e.g. to represent non-Ascii chars or functions parameters substitutions inside strings). This is a good solution in languages that have many escape-like special substitutions inside strings, and to allow distinguishing between them by adopting different colors (because they are often found side by side inside strings).
The problem is that it's currently very difficult to prevent escapes and interpolations from showing up outside strings, in preprocessor strings and in filepath strings using OnStateChange()
hacks.
Especially hard is trying to implement a tracking system that distinguishes between text-strings (for print/output statements) and filepath strings (for commands dealing with external files). Windows path strings often end up with escapes false positives due to the backslash dir separator.
I've tried to implement a string type tracker via OnStateChange()
, so that when tokens which are always followed by filepath strings are encountered they set to true a isPathStr
variable, which is then checked when newState
is an escape of interpolation so that they might get discarded if found true.
The problem here was that I had to also reset isPathStr
to false when encountering newState
other than string, escape, interpolation or the triggering tokens, so that normal strings would still work.
The hack seemed to work for a limited time (e.g. ignoring only a couple of escapes and interpolations), so probably the variable was being accidentally reset due to unknown reasons.
The point is that tracking custom states across multiple contents can be very tricky and bug prone, therefore introducing some extra options to handle at least escapes outside string and interpolations inside preprocessor strings would reduce the hack context and make it simpler to implement custom states.
Extending Highlight Syntax Options
Since Strings
, Escape
and Interpolation
are special Highlight elements on their own right, I think it might be worth introducing some new options in syntax definition to control how these interact with each other.
In the current state of Highlight:
- Interpolations are never matched outside strings nor inside preprocessor strings.
- Escape sequences can occur inside strings or on the loose, but not in preprocessor strings.
Since some syntaxes allow escapes outside strings, it might be worth adding an option to allow automatically discarding escapes outside strings, something like:
-
EscapeInsideStringsOnly
(boolean)
To prevent Interpolation
in preprocessor strings, it would be useful to have an option like:
-
InterpolationInPreProcessorStrings
(boolean)
If the former defaulted to false
, and the latter to true
, these should be non-breaking changes.
Raw Strings
It's not clear to me how Raw Strings are currently handled by Highlight, and if the RawPrefix
does prevent Escape
and Interpolation
matching inside strings, and whether it also affects preprocesor strings or not.
It would be useful though if there was a way to tell the Highlight engine to arbitrarily treat the current string as a raw string — this would allow to prevent escapes in filepath string, by enabling this after encountering one of the tokens that are followed by path strings. But then Highlight would need to resume the normal behaviour once the string is finished, which I'm not quite sure if it's possible since apparently Highlight sees a new string after each escape/interpolation, and does not have/offer a whole string.
If Highlight were to keep track of strings beginning and end, i.e. if it would consider escapes and interpolations inside strings as an extension of the string context, so that the user was able to query at any point of the parsing process if the current context is inside-a-string, things would be much simpler.
Maybe it's just me that I'm lacking a full insight on how all these elements are actually broken down during parsing, but surely there is a need to being able to implement some custom states tracking in a simpler and more efficient manner.
I'm sure that Highlight could benefit from a couple of new options to handle in-string matching, and that exposing a few new internal state variables might simplify custom tracking too.
Ultimately, all these problems and hacks result from the fact that Highlight doesn't nest contexts (e.g. escapes are not really inside strings). While this is a good choice in terms of how elements are tagged, having some pseudo-nesting internal variables to track when strings really begin and end would greatly improve the situation.
After all, Highlight is aware of the string delimiters, so he knows when a string begins and ends, even if the string spans across multiple lines. The occurrence of escapes and interpolations (which should be only two elements found inside strings) shouldn't affect tracking the begin/end state of a string.
Right now, I haven't managed to exploit successfully HL_STRING_END
for this, so I deduced that it doesn't track the actual end of a string (i.e. encountering the closing delimiter) but just when a string tag is being closed to allow an escape/interpolation tag to begin.