overview.org 10 KB
Newer Older
gilmi's avatar
gilmi committed
1
* Giml
gilmi's avatar
gilmi committed
2
3
** Features
*** Variants
4
5
6
Variants are data constructors with some data attached to them.

They tag the data so it can be distinguished from different variants
gilmi's avatar
gilmi committed
7
in the data definition.
8
9
10
11
12
13

For example in Haskell:

#+BEGIN_SRC haskell
data Definition
  = Variable Var Expr
14
  | Function Var [Var] Block
15
16
#+END_SRC

gilmi's avatar
gilmi committed
17
18
Here we define a new data type named ~Definition~ which has two possible representations,
either a ~Variable~ with two specific types attached to it, or a ~Function~ with three.
19

gilmi's avatar
gilmi committed
20
In Giml variants must have exactly one type attached to them.
21
22
23

We will compile variants to javascript using objects. A variants schema is going to be:

gilmi's avatar
gilmi committed
24
#+BEGIN_SRC haskell
25
{ _constr: <String>, _field: <Expr> }
gilmi's avatar
gilmi committed
26
#+END_SRC
27
28
29

We prefix with underscore to avoid name collisions with records.

gilmi's avatar
gilmi committed
30
31
When we pattern match on variants we will check the tag to see if we're looking at the right variant,
and then check its field.
32

gilmi's avatar
gilmi committed
33
*** Pattern matching
gilmi's avatar
gilmi committed
34
Pattern matching is a way to implement control flow.
35

gilmi's avatar
gilmi committed
36
We examine the shape of a value and decide what to do accordingly.
37
38
39
40
41
42
43
44

In Haskell it looks like this:

#+BEGIN_SRC haskell
case mydef of
  Variable var expr ->
    JS.Variable var (translateExpr expr)
  Function var args body ->
45
    JS.Function var args (translateBlock body)
46
47
48
49
#+END_SRC

Pattern matching can be represented using:

gilmi's avatar
gilmi committed
50
1. An expression on which we will match on
51
52
53
54
55
56
57
2. A list of patterns and their respected result

A pattern can be one of the following:

- A wildcard, denoted as underscore, will succeed for any expression and will not bind anything in the scope of the result
- A variable, will succeed for any expression and will bind the data in place to that variable in the result
- A literal, will succeed if the expression in the same place is a literal of the same value
gilmi's avatar
gilmi committed
58
- A variant, which will succeed if the matched data is of the same tag and the pattern it holds matches as well
59
60
- A record, which will succeed if the matched data is a record which has all of the fields in the pattern and they match.

gilmi's avatar
gilmi committed
61
62
We will compile pattern matching into several if statements that if succeeds will call a function
with the matched variables and return from there.
63
64

We will also need to generate fresh variables for the variables we match on.
65
*** Records
gilmi's avatar
gilmi committed
66
Records in Giml are anonymous and first class.
67
68
69
70

Implementing them is straightforward because JavaScript already has objects.

Selecting a specific field in a record is simple, just use .field syntax.
gilmi's avatar
gilmi committed
71
Giml will also use this syntax.
72

gilmi's avatar
gilmi committed
73
Record extension is slightly more tricky - since records in giml are immutable,
74
75
we need to first copy the record, mutate it, and return that.

gilmi's avatar
gilmi committed
76
Also, changing the type of a field is OK is giml.
77
78
79
80
81
82
83
84
85
86
87
88
89

- ~{ x = 7 }.x~ should result in ~7~

#+BEGIN_SRC haskell
let r = { x = 7, y = 0.0 }
{ x = "hello" | r }
#+END_SRC

should result in ~{ x = "hello", y = 0.0 }~.
*** Built-ins
We want to be able to run some native JavaScript operations,
We can use the ~EFfi~ constructor to generate normal javascript functions,
But we need a new constructor for binary operations.
gilmi's avatar
gilmi committed
90
*** Type inference
gilmi's avatar
gilmi committed
91
Giml is statically typed, and has global type inference.
gilmi's avatar
gilmi committed
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
Statically typed means any expression has a type, known before the program runs.
Type inferred means we don't need to write type annotations.

Our type inference algorithm works in stages:

First, we *elaborate* the AST and collect *constraints*.
Then, we solve these constraints and get back a *substitution*.
We go over the AST again, replacing the type variables we added in the elaboration stage
with concrete types.
**** 1. Elaboration and Constraints Collection
*Elaboration* means annotating the AST with types, if we can deduce the type
from the tree structure, we annotate that AST node with the type we know.
If we don't, we generate a new type variable name and annotate the tree with that.
Usually in those cases that we don't know the type of the expression, we can sort of
have a rough idea what it might be, for example: a ~EFunCall f args~ tells us that
~f~ /should be/ a function.

A *constraint* means some restriction about a type. Our most common constraint
is going to be ~Equality~ - two types should represent the same type.

After going over the AST, generating and assigning types for each node and collecting
all of the constraints on those types, we *solve* the constraints.
**** Constraints solving
We go over the constraints and replace type variables with the types they represent
according to the constraints, and check that the types match.

For example, if we see ~Equality (TypeVar "t1") (TypeCon "Int")~, we can go over the
rest of the constraints and replace ~t1~ with ~Int~.

And if we see ~Equality (TypeCon "Int") (TypeCon "String")~, we throw a type error,
because the two types do not match.

We also collect all of these replacements (or substitutions) we did into a mapping
from type variables to types. This mapping is called *a substitution*.
**** Substitute
After we've successfully created a substitution without generating an error, we apply
this substitution back into our AST and get our correctly type annotated AST.



132
133
134
135
136
137
138
**** Let Polymorphism
Let polymorphism gives us the ability to use a generic function in more contexts.

For example, ~id~ is a function that can work for ~x~ of any type. But our algorithm
collects constraints globally, including that:

#+BEGIN_SRC haskell
gilmi's avatar
gilmi committed
139
id = \x -> x
140

gilmi's avatar
gilmi committed
141
one = id 1          -- constrain that the type of id *is equal to* the type [Int] -> tN
142

gilmi's avatar
gilmi committed
143
hello = id "hello"  -- constrain that the type of id *is equal to* the type [String] -> tM
144
145
146
#+END_SRC

We need to invent a new constraint that will define the relationship between the type of id
147
and the arguments passing to it as an *Instance of* relationship.
148
149
150
151
152
153
154
155
156
157
158

~InstanceOf t1 t2~ relationship means that ~t1~ is an *instantiation* of ~t2~.
What we'll do is copy the type of ~t2~, generate new type variables in place of all type variables
inside of it, and then say that this new type ~t3~ has an equality relationship with ~t1~.

It's important to solve the equality constraints for each function before solving the InstanceOf
constraints, so that when we instantiate we already have the final type of the function.

We will highjack the ~Ord~ instance deriving (constructors defined later are bigger)
and the fact that ~Set~ is ordered to accomplish that.

159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
*** Extensible records
*** Polymorphic Variants
Polymorphic variants are built in a very similar manner to extensible records, but with some extra
machinary and different roles.

**** Types

We separate variants into three kinds:

***** Variants (denoted ~[ <Tag> : <Type> | ... ]~)
A variant describe that the type of the expression is one of the mentioned constrs+types.

Two normal variants unify by matching (for equality) all of the types of the matching constrs.
If the variants have different labels, the unification fails.

***** Upper bounded polymorphic variants (denoted ~[< <Tag> : <Type> | ... ]~)
Upper bounded polymorphic variants are generated from pattern matching a specific variant, and mean that the value passed as the expression in the pattern matching must be *at most* the types described in the upper bound.

so in order to match ~t~ with ~[< IntTag : Int | BoolTag : Int ]~, a value of type ~t~ must be either an ~#IntTag <integer>~ or a ~#BoolTag <boolean>~.

So ~t~ is *smaller* than ~[< <Tag> : <Type> | ... ]~ (which is why we have that ~<~ there)

***** Lower bounded polymorphic variants (denoted ~[> <Tag> : <Type> | ... ]~)
Lower bounded polymorphic variants are generated by open variant expressions and by ~Capture~ (variable) pattern matching (as a special form of ~[<]~ which mean any variant).

Lower bounded polymorphic variants mean that a given type has *at least* all of these types.

**** Implementation

Polymorphic variants have the same representation as regular variants.

***** Type inference

| Generate    | Normal | LB           | UB           |
|-------------+--------+--------------+--------------|
| Patterns    |        | PVar (~[>]~) | POpenVariant |
| Expressions |        | OpenVariant  |              |

| Unify  | Normal              | LB                                | UB                                      | Other                              |
|--------+---------------------+-----------------------------------+-----------------------------------------+------------------------------------|
| Normal | match Constrs/types | L must contain R                  | Like normals                            | No                                 |
| LB     | R must contain L    | match+match diff (new LB) with tv | Like normals                            | Only ~[>]~, behaves like ~TypeVar~ |
| UB     | Like normals        | Like normals                      | if same tv then merge else like normals | No                                 |


|             | Normal     | LB  | UB  |
|-------------+------------+-----+-----|
| Instantiate | No typevar | Yes | Yes |

|            | LB    | UB                                                       |
|------------+-------+----------------------------------------------------------|
| Substitute | merge | if substitute is typevar then replace typevar else merge |

**** System Limitations

As described [[https://www.youtube.com/watch?v=I5e8lONU67A][here (polymorphic variants in OCaml)]].
gilmi's avatar
gilmi committed
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
** Compiler architecture
*** Logging
Currently when we want to look at the inner workings of the compiler,
we need to add traces to the code and recompile it.

I'd like to be able to view what's going on without recompiling the code.

For example, if I'd like to see the order and grouping of definitions in the REPL,
I'll only need to switch a flag.

For that we'll need to do some refactoring.

Since the current interface returns ~Either~ everywhere, we need to change that to something a bit more
involved, so we can add logging capabilities.

Things we want to log:

- File
- Stages (pre infer, post infer, etc.)
- Specific rewrite/process (parsing, grouping definitions, elaboration, constraint solving)
- Verbosity level (only constraints generated, each constraint, etc.)
- Message/Info

Features we want:

- Log anywhere in the compiler, should work with different monad stacks
- Both pure and IO backends (pure can do nothing instead)
- Set files/stages hierarchically