|
|
Highlight v3.45
|
|
|
Highlight v3.48
|
|
|
|
|
|
# Language Definitions References
|
|
|
# Highlight Languages Definitions
|
|
|
|
|
|
Useful references for creating your own Highlight syntax definition files.
|
|
|
Wiki pages on the topic:
|
|
|
|
|
|
|
|
|
|
|
|
-----
|
|
|
|
|
|
**Table of Contents**
|
|
|
|
|
|
<!-- MarkdownTOC autolink="true" bracket="round" autoanchor="false" lowercase="only_ascii" uri_encoding="true" levels="1,2,3" -->
|
|
|
|
|
|
- [Introduction](#introduction)
|
|
|
- [Mandatory Elements](#mandatory-elements)
|
|
|
- [Highlight Defaults](#highlight-defaults)
|
|
|
- [Regular Expressions](#regular-expressions)
|
|
|
- [Syntax Elements](#syntax-elements)
|
|
|
- [Keywords](#keywords)
|
|
|
- [Comments](#comments)
|
|
|
- [Strings](#strings)
|
|
|
- [Escape](#escape)
|
|
|
- [Interpolation](#interpolation)
|
|
|
- [RawPrefix](#rawprefix)
|
|
|
- [PreProcessor](#preprocessor)
|
|
|
- [NestedSections](#nestedsections)
|
|
|
- [Description](#description)
|
|
|
- [Digits](#digits)
|
|
|
- [Identifiers](#identifiers)
|
|
|
- [Operators](#operators)
|
|
|
- [EnableIndentation](#enableindentation)
|
|
|
- [IgnoreCase](#ignorecase)
|
|
|
- [Related Wiki Pages](#related-wiki-pages)
|
|
|
- [External Links](#external-links)
|
|
|
- [Highlight Documentation](#highlight-documentation)
|
|
|
- [Regular Expressions References](#regular-expressions-references)
|
|
|
- [Programming Languages References](#programming-languages-references)
|
|
|
|
|
|
<!-- /MarkdownTOC -->
|
|
|
|
|
|
-----
|
|
|
|
|
|
|
|
|
|
|
|
# Introduction
|
|
|
|
|
|
Language definitions are Lua files with "`.lang`" extension, stored in Highlight's "`langDef`" subfolder.
|
|
|
|
|
|
Language definitions are documented in Highlight's [`README`][README] file, under section "__[3.2 LANGUAGE DEFINITIONS][README 3.2]__". The [`README_PLUGINS`][README_PLUGINS] file contains useful information too.
|
|
|
|
|
|
## Mandatory Elements
|
|
|
|
|
|
The bare minimum definitions required for a langDef file to be valid are:
|
|
|
|
|
|
- [`Description`][Description]
|
|
|
- [`Keywords`][Keywords]
|
|
|
|
|
|
If a langDef file doesn't provide these definitions, Highlight will raise an
|
|
|
error. All other definitions are optional.
|
|
|
|
|
|
## Highlight Defaults
|
|
|
|
|
|
Highlight provides a default definition to the following syntax elements:
|
|
|
|
|
|
- [`Identifiers`][Identifiers]
|
|
|
- [`Digits`][Digits]
|
|
|
- [`Escape`][Escape]
|
|
|
|
|
|
All other definition are empty/false by default.
|
|
|
|
|
|
## Regular Expressions
|
|
|
|
|
|
Creating a language definition requires knowledge of regular expressions (aka RegExs). Highlight uses [Boost]'s "[xpressive]" library for regular expressions. You can find links to RegEx resources in the _[Regular Expressions References]_ section.
|
|
|
|
|
|
# Syntax Elements
|
|
|
|
|
|
The full list of supported syntax elements, one by one, with their [official documentation description][README 3.2] followed by comments and examples.
|
|
|
|
|
|
| ELEMENT | TYPE | REQUIREMENT |
|
|
|
|------------------------------------------|----------------|-------------|
|
|
|
| [`Comments`][Comments] | table | |
|
|
|
| [`Description`][Description] | string | mandatory |
|
|
|
| [`Digits`][Digits] | string (RegEx) | |
|
|
|
| [`EnableIndentation`][EnableIndentation] | boolean | |
|
|
|
| [`Identifiers`][Identifiers] | string (RegEx) | |
|
|
|
| [`IgnoreCase`][IgnoreCase] | boolean | |
|
|
|
| [`Keywords`][Keywords] | table | mandatory |
|
|
|
| [`NestedSections`][NestedSections] | table | |
|
|
|
| [`Operators`][Operators] | string (RegEx) | |
|
|
|
| [`PreProcessor`][PreProcessor] | table | |
|
|
|
| [`Strings`][Strings] | table | |
|
|
|
|
|
|
|
|
|
## Keywords
|
|
|
|
|
|
```
|
|
|
Keywords = { Id, List|Regex, Group? }
|
|
|
|
|
|
Id: Integer, keyword group id (values 1-4, can be reused for several keyword
|
|
|
groups)
|
|
|
List: List, list of keywords
|
|
|
Regex: String, regular expression
|
|
|
Group: Integer, capturing group id of regular expression, defines part of regex
|
|
|
which should be returned as keyword (optional; if not set, the match
|
|
|
with the highest group number is returned (counts from left to right))
|
|
|
```
|
|
|
|
|
|
|
|
|
> __IMPORTANT__ — If you set [`IgnoreCase = true`][IgnoreCase], remember to keep all keywords in lowercase, otherwise they will never match a token. This is due to the fact that for case insensitive comparison, all parsed tokens are converted to lower case before trying to match them against the entries in the keywords lists.
|
|
|
|
|
|
## Comments
|
|
|
|
|
|
```
|
|
|
Comments = { {Block, Nested?, Delimiter={Open, Close?} }
|
|
|
|
|
|
Block: Boolean, true if comment is a block comment
|
|
|
Nested: Boolean, true if block comments can be nested (optional)
|
|
|
Delimiter: List, contains open delimiter regex (line comment) or open and close
|
|
|
delimiter regexes (block comment)
|
|
|
```
|
|
|
|
|
|
## Strings
|
|
|
|
|
|
```
|
|
|
Strings = { Delimiter|DelimiterPairs={Open, Close, Raw?}, Escape?, Interpolation?,
|
|
|
RawPrefix?, AssertEqualLength? }
|
|
|
|
|
|
Delimiter: String, regular expression which describes string delimiters
|
|
|
DelimiterPairs: List, includes open and close delimiter expressions if not
|
|
|
equal, includes optional Raw flag as boolean which marks
|
|
|
delimiter pair to contain a raw string
|
|
|
Escape: String, regex of escape sequences (optional)
|
|
|
Interpolation: String, regex of interpolation sequences (optional)
|
|
|
RawPrefix: String, defines raw string indicator (optional)
|
|
|
AssertEqualLength: Boolean, set true if delimiters must have the same length
|
|
|
```
|
|
|
|
|
|
|
|
|
For examples of string in different programming languages, see this excellent page by Pascal Rigaux:
|
|
|
|
|
|
- [Syntax Across Languages] » [Strings » strings]
|
|
|
|
|
|
### Escape
|
|
|
|
|
|
Escape: String, regex of escape sequences (optional)
|
|
|
|
|
|
- [Wikipedia: Escape sequences in C]
|
|
|
|
|
|
If the language at hand supports escape sequences, define a RegEx pattern to capture them.
|
|
|
|
|
|
#### HL Default Value
|
|
|
|
|
|
Highlight's default built-in `Escape` definition:
|
|
|
|
|
|
Escape=[=[ \\u[[:xdigit:]]{4}|\\\d{3}|\\x[[:xdigit:]]{2}|\\[ntvbrfa\\\?'"] ]=],
|
|
|
|
|
|
#### Restricting Escape Sequences to Strings
|
|
|
|
|
|
Escape sequences are not restricted to occur inside strings only, they will be matched anywhere in the source code (some languages, like Perl and Bash, allow their use anywhere). Usually this doesn't constitute a problem, but in some languages this uncostrained behaviour might cause false positives matches; in such cases you'll need to restrict escape sequences occurence to inside-strings context only by implementing a custom hook via the `OnStateChange()` function:
|
|
|
|
|
|
``` lua
|
|
|
function OnStateChange(oldState, newState, token, kwgroup)
|
|
|
if newState==HL_ESC_SEQ and oldState~=HL_STRING then
|
|
|
return HL_STANDARD
|
|
|
end
|
|
|
return newState
|
|
|
end
|
|
|
```
|
|
|
|
|
|
|
|
|
### Interpolation
|
|
|
|
|
|
String, regex of interpolation sequences (optional)
|
|
|
|
|
|
- [Wikipedia: String interpolation]
|
|
|
|
|
|
To understand interpolation, let's take Javascript as an example:
|
|
|
|
|
|
``` javascript
|
|
|
var apples = 6;
|
|
|
console.log(`There are ${apples} apples in the basket!`);
|
|
|
```
|
|
|
|
|
|
… which will otuput:
|
|
|
|
|
|
There are 6 apples in the basket!
|
|
|
|
|
|
The `Interpolation` definition for JavaScript is:
|
|
|
|
|
|
``` lua
|
|
|
Interpolation=[=[ \$\{.+?\} ]=],
|
|
|
```
|
|
|
|
|
|
|
|
|
For examples of string interpolation in different programming languages, see this excellent page by Pascal Rigaux:
|
|
|
|
|
|
- [Syntax Across Languages] » [Strings » strings]
|
|
|
|
|
|
### RawPrefix
|
|
|
|
|
|
RawPrefix: String, defines raw string indicator (optional)
|
|
|
|
|
|
## PreProcessor
|
|
|
|
|
|
```
|
|
|
PreProcessor = { Prefix, Continuation? }
|
|
|
|
|
|
Prefix: String, regular expression which describes open delimiter
|
|
|
Continuation: String, contains line continuation character (optional).
|
|
|
```
|
|
|
|
|
|
This element is treated by Highlight parser in a similar way to single-line comments: it swallows up everything from the matching Prefix up to the end of the line — but unlike comment lines (which can't contain further syntax elements), the parser will still be looking for some syntax elements (in the current line) which might be reasonably found within a line of preprocessor directives, ie: strings and comments (and within strings: escape sequences and interpolation, if supported); but once these elements are dealt with, the parser will resume the PreProcessor state to carry on parsing the rest of the line.
|
|
|
|
|
|
Furthermore, the `Continuation` character allows this element to span across multiple line (without the need of an opening and closing pair, unlike multiline comments do).
|
|
|
|
|
|
Here is an example of C/C++ `PreProcessor` definition:
|
|
|
|
|
|
``` lua
|
|
|
PreProcessor = { -- C/C++ PreProcessor example:
|
|
|
Prefix=[=[ # ]=], -- Hash char ('#') marks beginning of preprocessor line
|
|
|
Continuation="\\", -- Backslash ('\') marks continuation of preprocessor line
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## NestedSections
|
|
|
|
|
|
```
|
|
|
NestedSections = {Lang, Delimiter= {} }
|
|
|
|
|
|
Lang: String, name of nested language
|
|
|
Delimiter: List, contains open and close delimiters of the code section
|
|
|
```
|
|
|
|
|
|
## Description
|
|
|
|
|
|
```
|
|
|
Description: String, Defines syntax description
|
|
|
```
|
|
|
|
|
|
## Digits
|
|
|
|
|
|
```
|
|
|
Digits: String, Regular expression which defines digits (optional)
|
|
|
```
|
|
|
|
|
|
## Identifiers
|
|
|
|
|
|
```
|
|
|
Identifiers: String, Regular expression which defines identifiers
|
|
|
(optional)
|
|
|
```
|
|
|
|
|
|
|
|
|
`Identifiers` is for the parser internal use, and it has no corresponding syntax element in the final output (ie: it doesn't produce any `<span>` or class, in html)
|
|
|
|
|
|
The definition of `Identifiers` is the RegEx pattern used by the parser to identify eligible tokens to be evalued against `Keyword`s lists — ie: only tokens matching the `Identifiers` pattern are searched in these lists. It doesn't affect RegEx-defined `Keyword`s, though.
|
|
|
|
|
|
This is an important point to grasp because `Keyword`s have a high priority over other elements; therefore a too broad `Identifiers` pattern might prevent some tokens to be passed on to parsing of lower-priority elements (it seems that tokens matching as `Identifiers` are consumed even if they don't actaully match any kewyords).
|
|
|
|
|
|
__HL Default Value__
|
|
|
|
|
|
Highlight’s default built-in `Identifiers` definition:
|
|
|
|
|
|
``` lua
|
|
|
Identifiers=[=[ [a-zA-Z_]\w* ]=]
|
|
|
```
|
|
|
|
|
|
For examples of regular expressions definitions for identifiers of different programming languages, see this excellent page by Pascal Rigaux:
|
|
|
|
|
|
- [Syntax Across Languages] » [Various » tokens]
|
|
|
|
|
|
## Operators
|
|
|
|
|
|
```
|
|
|
Operators: String, Regular expression which defines operators
|
|
|
```
|
|
|
|
|
|
## EnableIndentation
|
|
|
|
|
|
```
|
|
|
EnableIndentation: Boolean, set true if syntax may be reformatted and indented
|
|
|
```
|
|
|
|
|
|
## IgnoreCase
|
|
|
|
|
|
```
|
|
|
IgnoreCase: Boolean, set true if keyword case should be ignored
|
|
|
```
|
|
|
|
|
|
This setting determines if source files should be treated cases-senstively or not.
|
|
|
|
|
|
> __IMPORTANT__ — If you set `IgnoreCase = true`, remember to keep all keywords in lowercase, otherwise they will never match a token. This is due to the fact that for case insensitive comparison, all parsed tokens are converted to lower case before trying to match them against the entries in the keywords lists.
|
|
|
|
|
|
# Related Wiki Pages
|
|
|
|
|
|
- [Debugging Language Definitions][LangDefs-Debugging]
|
|
|
- [Languages Definitions Syntax Elements][LangDefs-Element]
|
|
|
- [Testing Language Definitions][LangDefs-Testing]
|
|
|
- [Debugging Language Definitions][LangDefs-Debugging]
|
|
|
- [Lua Basics][Lua-Basics]
|
|
|
- [Parser States][Parser-States]
|
|
|
|
|
|
# External Links
|
|
|
|
|
|
## Highlight Documentation
|
|
|
|
|
|
- [`README`][README]:
|
|
|
- "[3.2 LANGUAGE DEFINITIONS][README 3.2]"
|
|
|
- [List of HL internal states variables][README HL States]
|
|
|
- [`README_PLUGINS`][README_PLUGINS]:
|
|
|
- "[3. SYNTAX CHUNK ELEMENTS][README_PLUGINS 3]"
|
|
|
|
|
|
## Regular Expressions References
|
|
|
|
|
|
- [Boost] » [xpressive] — the RegEx library used by Highlight.
|
|
|
- [Regular-Expressions.info] — excellent reference and learning website on RegExs.
|
|
|
|
|
|
## Programming Languages References
|
|
|
|
|
|
- [Syntax Across Languages] — by Pascal Rigaux «Pixel». A comprehensive comparative document listing the various syntax elements of numerous languages.
|
|
|
+ [Various » tokens] — this sections contains RegExs definitions of identifiers in various languages.
|
|
|
+ [Strings » strings] — section on strings in various languages.
|
|
|
- [Syntax Across Languages (sorted by languages)]
|
|
|
- [Parser Workflow][Parser-States]
|
|
|
|
|
|
|
|
|
<!-----------------------------------------------------------------------------
|
|
|
REFERENCE LINKS
|
|
|
------------------------------------------------------------------------------>
|
|
|
|
|
|
[Boost]: https://www.boost.org/ "Visit Boost website"
|
|
|
[xpressive]: https://www.boost.org/doc/libs/1_46_1/doc/html/xpressive/user_s_guide.html "Go to xpressive's online documentation"
|
|
|
[Regular-Expressions.info]: https://www.regular-expressions.info/ "Visit Regular-Expressions.info"
|
|
|
|
|
|
<!------------------------- Syntax Across Languages -------------------------->
|
|
|
|
|
|
[Syntax Across Languages]: http://rigaux.org/language-study/syntax-across-languages.html "Visit the 'Syntax Across Languages' page"
|
|
|
|
|
|
[Various » tokens]: http://rigaux.org/language-study/syntax-across-languages/Vrs.html#VrsTkns "View the 'tokens' section of the 'Syntax Across Languages' page"
|
|
|
|
|
|
[Strings » strings]: http://rigaux.org/language-study/syntax-across-languages/Strng.html#StrngStrng "View the 'strings' section of the 'Syntax Across Languages' page"
|
|
|
|
|
|
[Syntax Across Languages (sorted by languages)]: http://rigaux.org/language-study/syntax-across-languages-per-language/ "Visit the 'Syntax Across Languages' page (sorted by languages version)"
|
|
|
|
|
|
<!------------------------- Highlight Documentation -------------------------->
|
|
|
|
|
|
[README]: https://github.com/tajmone/highlight/blob/master/README.adoc "View Highlight README on my GitHub fork"
|
|
|
|
|
|
[README 3.2]: https://github.com/tajmone/highlight/blob/master/README.adoc#language-definitions "Highlight README » Section 3.2 LANGUAGE DEFINITIONS"
|
|
|
|
|
|
[README HL States]: https://github.com/tajmone/highlight/blob/master/README.adoc#global-variables "Highlight README » List of internal highlighting states variables"
|
|
|
|
|
|
<!-- README_PLUGINS -->
|
|
|
|
|
|
[README_PLUGINS]: https://github.com/tajmone/highlight/blob/master/README_PLUGINS.adoc "Highlight README_PLUGINS"
|
|
|
|
|
|
[README_PLUGINS 3]: https://github.com/tajmone/highlight/blob/master/README_PLUGINS.adoc#syntax-chunk-elements "Highlight README_PLUGINS » 3. SYNTAX CHUNK ELEMENTS"
|
|
|
|
|
|
|
|
|
<!--------------------------------- WikiPedia -------------------------------->
|
|
|
|
|
|
[Wikipedia: Escape sequences in C]: https://en.wikipedia.org/wiki/Escape_sequences_in_C "Wikipedia: Escape sequences in C"
|
|
|
|
|
|
[Wikipedia: String interpolation]: https://en.wikipedia.org/wiki/String_interpolation "Wikipedia: String interpolation"
|
|
|
|
|
|
<!-------------------------------- Wiki Pages -------------------------------->
|
|
|
|
|
|
[LangDefs-Debugging]: ./LangDefs-Debugging "See Wiki page 'Debugging Language Definitions'"
|
|
|
[LangDefs-Element]: ./LangDefs-Element "See Wiki page on 'Languages Definitions: Syntax Elements'"
|
|
|
[LangDefs-Testing]: ./LangDefs-Testing "See Wiki page 'Testing Language Definitions'"
|
|
|
|
|
|
[Lua-Basics]: ./Lua-Basics "See Wiki page 'Lua Basics'"
|
|
|
[Parser-States]: ./Parser-States "See Wiki page 'Parser Workflow' for insights into Highlight's parser workflow"
|
|
|
|
|
|
[Parser-States]: ./Parser-States "See Wiki page 'Parser-States' for insights into Highlight's parser workflow"
|
|
|
|
|
|
<!------------------------- Wiki Page Cross Reference ------------------------>
|
|
|
|
|
|
[Comments]: #comments
|
|
|
[Description]: #description
|
|
|
[Digits]: #digits
|
|
|
[EnableIndentation]: #enableindentation
|
|
|
[Escape]: #escape "Strings { Escape? }"
|
|
|
[Identifiers]: #identifiers
|
|
|
[IgnoreCase]: #ignorecase
|
|
|
[Interpolation]: #interpolation "Strings { Interpolation? }"
|
|
|
[Keywords]: #keywords
|
|
|
[NestedSections]: #nestedsections
|
|
|
[Operators]: #operators
|
|
|
[PreProcessor]: #preprocessor
|
|
|
[RawPrefix]: #rawprefix "Strings { RawPrefix? }"
|
|
|
[Strings]: #strings
|
|
|
|
|
|
[Regular Expressions References]: #regular-expressions-references "Jump to the External Links section on RegExs References"
|
|
|
|
|
|
<!-- EOF --> |
|
|
<!-- EOF --> |
|
|
\ No newline at end of file |