Skip to content

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
    • Help
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
urn
urn
  • Project
    • Project
    • Details
    • Activity
    • Releases
    • Cycle Analytics
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Charts
    • Locked Files
  • Issues 9
    • Issues 9
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge Requests 3
    • Merge Requests 3
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Charts
  • Snippets
    • Snippets
  • Members
    • Members
  • Collapse sidebar
  • Activity
  • Graph
  • Charts
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
  • urn
  • urnurn
  • Issues
  • #27

Closed
Open
Opened Jul 23, 2017 by hydraz@hydraz0 of 6 tasks completed0/6 tasks
  • Report abuse
  • New issue
Report abuse New issue

Constraint Propagation

N.B. Opening a separate issue for this since this one is going to be a wild ride, implementation wise, so it does need its own tracking ticket. Also making sure we know what we're doing.

Since the compiler has no type information to work with, we end up emitting (expensive!) type checks - which make up about 11% of compiler runtime (!). Other functions that are also big pain points in compiler performance right now are

  • car (which, unsurprisingly, spends most of its time doing type)
  • n (which, again, is slow because we switch on the type of the value)
  • nth (which is slow due to the conditional for negative indices)

Together with type, these functions make a whopping 23% of compiler runtime - a whole 2 seconds!

Constraint propagation addresses this issue by maintaining a set1 of true conditionals in scope at the current control-flow path. For example:

Explanation

(cond
  [(>= x 0)
   (print! "positive")] ; branch 1
  [(< x 0)
   (print! "negative")]) ; branch 2

Since this conditional has 2 branches, we have two divergent sets of constraints in this one simple expression, though they all mention the variable x. As an example of why this optimisation would help performance, let's expand a bit on branch 1.

  [(>= x 0)
   (print! (nth '(1 2 3) x))] ; branch 1

First, we apply the "mother of all optimisations, and inline nth.

  (print! (cond ; branch 1
            [(>= x 0)
             (.< '(1 2 3) x)]
            [true (.< '(1 2 3) (+ (n '(1 2 3) 1 x)))]))

But wait - We have a constraint that mentions x in scope at branch 1, and it just so happens to be (>= x 0). Armed with this knowledge, the compiler can fold the conditional away, leaving us with just an index operation (cheap):

  (print! (.< '(1 2 3) x))

We elided the branch, removed a conditional, and opened up an opportunity for constant-folding (if the value of x is known). Neat!

Statistics

As if you needed more motivation, but anyway. With some quick and dirty "statitics" on the only representative code base I have to analyse, the compiler itself, I was able to conclude this would be damn awesome.

We have:

  • 412 direct calls to (nth) (i.e., matches of the regular expression \(nth)
  • 403 direct calls to (n) (i.e., matches of the regular expression \(n)
  • 209 direct calls to (car) (i.e. matches of the regular expression \(car )

Not to mention, of course, the flame graph profiling, which provides much more accurate results (which, unfortunately, are a pain to analyse). I've profiled the compiler as it currently is (July 23rd), and uploaded it here.

A rough estimation shows us that, since (nth), (n), (car) and (type) both make up a fifth of compiler runtime and will potentially be sped up by this optimisation, it's fair to hope that implementing this comprehensively will remove ~2s of compile time, total, bringing the time to compile the compiler from 9 seconds to 7. Hopefully we'll also see a drop at urn-stats, too!

Roadmap

  • Collection of constraints from (cond) branches.
    • Deduplication of constraints
  • Inference of constraints from literals of all kinds
    To expand a bit:
    • Number literals give rise to a myriad possible constraints, and it'd be nicer to check these lazily instead of generating all possible constraints (which is, in general, impossible): Both constraints specifying sign, parity, range
    • String literals can possibly give rise to constraints regarding Lua patterns, but this is a bit of blue-sky thinking.
    • All literals have a defined type, for instance we can conclude (number? x) in (let* [(x 10)] ...).
  • Constraint propagation across function calls
    If we stash constraints in a set, copying only relevant constraints is cheap (i.e., only copy the related sets, instead of traversing the set to find relevant constraints)
  • Panic-edge code motion
    By this I mean lifting the statements after an if whose else branch is a call to error! into the then branch, as in this example. This allows us to transform linear function bodies with assertions into trees, and then discard the panic edges when appropriate.
  • cond folding
    The most important optimisation that adding constraint propagation brings, removing provably unreachable conditionals from (cond).
  1. It might be better to maintain a map of sets of constraints, with the map keyed by the variables the set of constraints applies to. ↩

Edited Jul 23, 2017 by hydraz

Related issues

Assignee
Assign to
None
Milestone
None
Assign milestone
Time tracking
None
Due date
No due date
2
Labels
enhancement help wanted
Assign labels
  • View project labels
Reference: urn/urn#27