N.B. Opening a separate issue for this since this one is going to be a wild ride, implementation wise, so it does need its own tracking ticket. Also making sure we know what we're doing.
Since the compiler has no type information to work with, we end up emitting (expensive!) type checks - which make up about 11% of compiler runtime (!). Other functions that are also big pain points in compiler performance right now are
car(which, unsurprisingly, spends most of its time doing
n(which, again, is slow because we switch on the type of the value)
nth(which is slow due to the conditional for negative indices)
type, these functions make a whopping 23% of
compiler runtime - a whole 2 seconds!
Constraint propagation addresses this issue by maintaining a set1 of true conditionals in scope at the current control-flow path. For example:
(cond [(>= x 0) (print! "positive")] ; branch 1 [(< x 0) (print! "negative")]) ; branch 2
Since this conditional has 2 branches, we have two divergent sets of
constraints in this one simple expression, though they all mention the
x. As an example of why this optimisation would help
performance, let's expand a bit on branch 1.
[(>= x 0) (print! (nth '(1 2 3) x))] ; branch 1
First, we apply the "mother of all optimisations, and inline
(print! (cond ; branch 1 [(>= x 0) (.< '(1 2 3) x)] [true (.< '(1 2 3) (+ (n '(1 2 3) 1 x)))]))
But wait - We have a constraint that mentions
x in scope at
branch 1, and it just so happens to be
(>= x 0). Armed with this knowledge,
the compiler can fold the conditional away, leaving us with just an
index operation (cheap):
(print! (.< '(1 2 3) x))
We elided the branch, removed a conditional, and opened up an
opportunity for constant-folding (if the value of
x is known). Neat!
As if you needed more motivation, but anyway. With some quick and dirty "statitics" on the only representative code base I have to analyse, the compiler itself, I was able to conclude this would be damn awesome.
- 412 direct calls to
(nth)(i.e., matches of the regular expression
- 403 direct calls to
(n)(i.e., matches of the regular expression
- 209 direct calls to
(car)(i.e. matches of the regular expression
Not to mention, of course, the flame graph profiling, which provides much more accurate results (which, unfortunately, are a pain to analyse). I've profiled the compiler as it currently is (July 23rd), and uploaded it here.
A rough estimation shows us that, since
(type) both make up a fifth of compiler runtime and will potentially
be sped up by this optimisation, it's fair to hope that implementing
this comprehensively will remove ~2s of compile time, total, bringing
the time to compile the compiler from 9 seconds to 7. Hopefully we'll
also see a drop at urn-stats, too!
Collection of constraints from
- Deduplication of constraints
Inference of constraints from literals of all kinds
To expand a bit:
- Number literals give rise to a myriad possible constraints, and it'd be nicer to check these lazily instead of generating all possible constraints (which is, in general, impossible): Both constraints specifying sign, parity, range
- String literals can possibly give rise to constraints regarding Lua patterns, but this is a bit of blue-sky thinking.
- All literals have a defined type, for instance we can conclude
(let* [(x 10)] ...).
Constraint propagation across function calls
If we stash constraints in a set, copying only relevant constraints is cheap (i.e., only copy the related sets, instead of traversing the set to find relevant constraints)
Panic-edge code motion
By this I mean lifting the statements after an
elsebranch is a call to
thenbranch, as in this example. This allows us to transform linear function bodies with assertions into trees, and then discard the panic edges when appropriate.
The most important optimisation that adding constraint propagation brings, removing provably unreachable conditionals from
It might be better to maintain a map of sets of constraints, with the map keyed by the variables the set of constraints applies to.