Replace tracing garbage collection with move-semantics and ownership, using the Immix allocator
In the last couple of days I've been looking at ways to introduce some form of (im)mutability to Inko. This is needed to make constants safe. Take this code for example:
let NUMBERS = Array.new(10, 20, 30)
Here we define a constant NUMBERS
, and assign it an array of three integers.
Constants are allocated on the permanent heap, and are never garbage collected.
Inko's processes in turn don't share memory with each other, but they can
access this permanent heap. This is necessary so they can access types, methods,
etc.
But this introduces a problem: it's possible for one process to modify these constants (e.g. by pushing values into the array), while another is reading from it. This can introduce race conditions, memory corruptions, etc.
To prevent this from happening, I looked into the possibility of introducing (im)mutable data types. Similar to for example Rust, you'd declare if your types/values are mutable or not. Constants then would be declared immutable, preventing them from being modified (concurrently).
Unfortunately, this has proven to be tricky. Take this code for example:
object Person {
@name: String
def init(name: String) {
@name = name
}
}
Person.new('Alice')
Here we create a Person
object/class, and its constructor (init
) assigns the
given name to the @name
field (@name
is basically syntax sugar for
self.name
). Let's say we introduce mutability, and specifically make it
transitive. That is, if a type T is mutable, all of its fields and sub-types are
also mutable. This creates a problem: the argument name
is immutable, but the
field assigned to (@name
) is mutable. Since you can't replace a mutable type
with an immutable one (= you wouldn't be able to mutate it), this will produce a
compiler error.
Another problem comes with return types. When writing a method, we may not know
if the returned type should be mutable or not. For some types this may not
matter, but when returning collections it might. Imagine you have a partition
method that returns a tuple in the form (A, B)
. Should this tuple and its
child values be mutable, or immutable? If we make it immutable, what happens if
a user wants to modify A
or B
? This could result in the following:
- Developers simply mark all their return types as mutable, somewhat defeating the purpose of introducing immutability (even if users can turn these types into immutable ones).
- You end up with duplicate methods:
partition
for an immutable return type, andpartition_mut
for a mutable one.
Neither is ideal. Combined with the issue of constructors, and the setup is likely to cause more frustration than benefits.
With this in mind I thought of the following: what if we introduce the concept of ownership and move-semantics, kind of like Rust. This would solve several problems:
- For the constructor we would take ownership of the values. This way we don't end up trying to assign immutable fields to mutable ones.
- When returning types, we just return them as owned; leaving the decision of mutability up to the user.
Then I thought more about this, and realised this opens up the way to several other optimisations:
- We can switch from tracing garbage collection to reference counting, making memory management more deterministic.
- Relying on move-semantics means most reference counting can be removed.
- We can introduce destructors that are triggered when the object is released,
instead of using an approach like Go's
defer
expression. - We can prevent developers from using already closed objects. So if you close a file, you can no longer write to it again (currently this produces a panic).
- We could remove a substantial amount of code currently dedicated to the garbage collector, the write barrier, the remembered set, etc.
- If we retain the Immix allocator, we would still have fast allocations; without the usual problems that a free-list allocator brings.
- Since Inko processes don't share memory, reference counting updates don't need to be atomic; reducing overhead.
There are several challenges though. Fragmentation is an issue we need to deal with, preferably a way that does not involve scanning all live objects and moving objects; that would be too expensive. We'd also have to decide that do with closures that move values, and what happens when you return a closure that captures a reference but not the value it refers to. The ideas I have are currently as follows:
- Closures of type
do
can't move values, and can be called many times. - Closures of type
do once
can move values, but are consumed when called. - Closures capturing references can't escape their surrounding scope, enforced using some form of escape analysis.
Single ownership of memory is also frustrating at times, and especially for a more high-level language like Inko I think it's too limiting. Instead, I'm thinking of a multi-ownership approach. The rough idea is as follows:
Given an owned type T, you can obtain either an immutable or a mutable borrow. Passing these around does not involve reference counting changes, but you can't move the pointed-to value when these borrows exist. This is nothing new, and Rust does this.
References in turn can't be stored in types, as that would require explicit lifetime annotations (or a very clever compiler); introducing too much complexity in the language. Instead, you'd "clone" the owned T and use that. I say "cloning" because all this would do is increment its reference count.
For cyclic data we would take two approaches:
- Rely on compiler analysis to prevent the creation of cyclic types
- Use weak pointers where we need cyclic types (the compiler would suggest this where possible).
Types such as iterators would be implemented using shared ownership (= a ref count change). This comes with some overhead, but also brings benefits:
- It doesn't require references in types and lifetime annotations.
- You don't need different iterators for the reference types you yield (e.g. an iterator for mutable values, immutable values, etc); instead you yield owned values and let the user decide.
- Using escape analysis we could likely further cut down the amount of
reference counting. For example, if we know a value
X
never escapes we can just allocate it as dead.
Permanent types such as string literals and float literals are passed as owned
values, but their reference counts are never changed. This means we don't need a
separate type for string literals (e.g. &'static str
in Rust), making things
easier for developers.
All of this makes this a very lucrative change. If done well, Inko could be one of the few languages (if not the only one) that has deterministic reference counting, without the runtime overhead, and without the complexity that coalesced/deferred reference counting brings. Of course it does require some work from the programmer, but without lifetime annotations being needed I think this overhead is minor.
With that said, I don't think this will be implemented before the self-hosting compiler is in use. Adding just (im)mutability to the Ruby compiler proves tricky, as it was never meant to be more than just a quick and dirty bootstrapping compiler. Instead, this is something we'd likely focus on after that; though we'd take it into account as we implement the self-hosting compiler.
Summary
So to recap, the plan is to:
- Investigate the possibility of introducing move-semantics and multi-ownership of data.
- Replace the tracing GC with a reference counting one.
- Eliminate reference counting as much as possible by relying on move-semantics and escape analysis. There won't be any runtime assistance such as coalescing or deferring of counts.
- Have the compiler insert reference counting increments/decrements, instead of the VM deciding when to do this. This gives the compiler greater control.
Remaining challenges/questions
Some are already discussed, but I'm putting them here for readability:
- Fragmentation, and specifically the moving of objects. We don't want to stop and scan all live objects, just so we can move a few around.
- Do we still need separate generations for objects in this setup, or can we just dump everything into a single generation?
- How would the compiler run destructors of types with sub-types that have destructors, and in what order? Likely the compiler would generate inline code to call each destructor.
- How do we do all of this without making Inko too foreign of a language? The target audience are people writing code in the likes of Ruby, Python, Java, etc. How do we keep things simple enough so these people actually want to try Inko, instead of saying "Well if I have to go through this trouble, why wouldn't I just use Rust?"
- What would the syntax look like? Probably
T
for owned,mut T
for mutable borrows,ref T
for immutable ones, andweak T
for weak pointers.