|
|
# What is the difference between a parse tree and an AST?
|
|
|
|
|
|
The parse tree is corresponding to the grammar, e.g. EBNF grammar or the ANTLR4 grammar. The parse tree corresponds 1:1 to the code, e.g. one tree corresponds to
|
|
|
```fortran
|
|
|
if (a == 1) print *, "OK"
|
|
|
```
|
|
|
and another tree corresponds to the equivalent code:
|
|
|
```fortran
|
|
|
if (a == 1) then
|
|
|
print *, "OK"
|
|
|
end if
|
|
|
```
|
|
|
There are multiple ways how these expression can be parsed, one example is:
|
|
|
```
|
|
|
if_statement
|
|
|
: if_cond statement # if_single_line
|
|
|
| if_block 'end' 'if' # if_multi_line
|
|
|
;
|
|
|
|
|
|
if_cond: 'if' '(' expr ')' ;
|
|
|
|
|
|
if_block
|
|
|
: if_cond 'then' NEWLINE+ statements ('else' (if_block | (NEWLINE+ statements)))?
|
|
|
;
|
|
|
```
|
|
|
another one is:
|
|
|
```
|
|
|
if_statement
|
|
|
: 'if' '(' expr ')' statement # if_single_line
|
|
|
| 'if' '(' expr ')' 'then' NEWLINE+ statements else_block? 'end' 'if' # if_multi_line
|
|
|
;
|
|
|
|
|
|
else_block
|
|
|
: 'else' (('if' '(' expr ')' 'then' NEWLINE+ statements else_block?) | (NEWLINE+ statements))
|
|
|
;
|
|
|
```
|
|
|
The parse tree then has nodes for each rule in the grammar. In both grammars the first code will be represented using a `if_single_line` node, while the second code will be represented by the `if_multi_line` node. However, the `if_multi_line` node has different children: `if_block -> statements` in the first grammar, but `statements` in the second grammar. The else branch also has different node structure in each grammar.
|
|
|
|
|
|
As such the parse tree depends on the variants of the "if" statement in the code as well as on how the grammar rules are constructed.
|
|
|
|
|
|
The next step is to take the given parse tree and produce the Abstract Syntax Tree (AST). The AST for the above example is unique:
|
|
|
```
|
|
|
If(expr test, stmt* body, stmt* orelse)
|
|
|
```
|
|
|
Both `if_single_line` and `if_multi_line` and both grammars (i.e., 4 combinations) all produce exactly the same AST node. The AST lost information about which of the two ways the if statement was written, as well as how the code was formatted (white space, parentheses, semicolons, ...).
|
|
|
|
|
|
We still have freedom in how exactly we define the AST, but it is a lot more unique than the multiple equivalent ways things can be expressed in Fortran code as well as multiple equivalent ways the grammar (and thus the parse tree) can be constructed to parse the code.
|
|
|
|
|
|
As an example, there is pretty much just one natural way to define the `If` AST node (see above), but there are still a few different ways to define e.g. how `a(5)` should be represented in AST, one option is `ArrayOrFunctionCall(arg1, arg2, ...)`, the other is to try to figure out from the context and choose `Array(arg1, ...)` and `FunctionCall(arg1, ...)`. One solution is to perhaps do `ArrayOrFunctionCall(arg1, arg2, ...)`, since that can be directly inferred from the parse tree. And then in the semantic analysis figure out if it's an array or a function and change the AST nodes to be more detailed.
|
|
|
|
|
|
For Python, here is the [parse tree](https://docs.python.org/3/reference/grammar.html) specification and here is the [AST](https://docs.python.org/3/library/ast.html#abstract-grammar) specification.
|
|
|
|
|
|
For C: [parse tree](https://www.lysator.liu.se/c/ANSI-C-grammar-y.html) and [AST](https://www.cs.utah.edu/flux/flick/current/doc/guts/gutsch6.html).
|
|
|
|
|
|
For Fortran: [parse tree](http://docs.cray.com/books/007-3694-003/html-007-3694-003/faxalchri.html) (see also the Appendix D in the [Fortran 2008 Standard](http://www.j3-fortran.org/doc/year/10/10-007.pdf)) and [AST]().
|
|
|
|
|
|
The parse tree is the most detailed tree which corresponds 1:1 with how exactly the code is written (usually it ignores white space, but that's about it). AST on the other hand is the most abstract tree that ignores as much about the code as it can, the only condition it has is that the code generated from AST, while inevitably different than the original code, must produce exactly the same results as the original when compiled (by some other compiler that, say, generates machine code directly from the parse tree, skipping AST). ASTs however are still able to represent different algorithms that produce the same results, such as array vs element-wise operations. |
|
|
\ No newline at end of file |