Use a single heap for both local and external allocations

Summary

Inko currently uses a separate heap for messages received from external processes. This allows the target process to run while receiving messages, without somehow being blocked by these messages.

We should investigate the possibility of using a single heap for both external and local allocations, using a mixture of atomic operations and locks. This would allow the removal of the mailbox heap, reducing memory usage of processes.

Motivation

The mailbox heap structures can take up a sizable portion of the total memory used for a process (excluding the memory needed for Immix block). This reduces the maximum number of processes we can run concurrently, depending on the memory available on the host system.

Implementation

All allocations use the current Bucket::allocate() method, which supports concurrent allocations. This method uses locking in two places, but this is only used in two cases:

When encountering a fragmented block, we will lock once we find a non-fragmented block.
When requesting a new block from the global allocator.

Locking here is mostly done to prevent concurrently running threads from requesting more blocks than necessary. In most cases only one thread will request the lock (= the mutator). Even when many threads compete for the lock, the synchronised operations are fast and thus the impact should be minimal.

Bucket::allocate_for_mutator() and Block::request_pointer_for_mutator() would be removed.

The Mailbox structure would have to be torn apart a bit. The new structure would consist out of:

A Read-Write lock (parking_lot::RwLock).
A concurrent queue of sorts, used for pushing messages into. This can probably just be a crossbeam channel, depending on memory size.
An atomic integer used to track the number of pending messages (so we don't need to count all messages in the queue)

When a process sends a message to another process, it will first acquire a "read" on the RwLock. Once acquired, it deep copies the message into the target process' heap. Once copied, it puts the copy into the message queue, then unlocks the RwLock.

When a process sends a message to itself it acquires a "read" on the RwLock. Instead of copying it just pushes the message directly into the queue.

When a process begins garbage collection, it acquires a "write" on the RwLock. A write is only allowed when there are no readers, and readers are blocked while there is a write lock. When garbage collection finishes, the write lock is released.

Receiving messages does not acquire any locks, and only uses atomic operations necessary for the message queue.

This setup allows for many processes to send messages, and allows the receiving process to receive messages. The write lock during garbage collection prevents external processes from messing up memory as we're cleaning up things.

This allows us to remove the mailbox heap and greatly simplify the mailbox structure.

Drawbacks

The atomic allocation procedures may at times be a little more expensive than the non-atomic ones.

A sending process can also be blocked if the receiver is garbage collecting. Usually this should be fine, but it could be a problem if the garbage collector has a lot of work to perform. We could handle this by returning some kind of flag in this case, rescheduling the process for future execution. The downside of this is that the process may end up being rescheduled rapidly, until the garbage collector is finished.

We could work around this by pushing the sending process into a queue for the receiving process. The garbage collector, when finished, would reschedule all these processes in this queue. This does mean that we would need memory for this queue, which might be wasteful depending on how often it is used. Alternatively we just don't deal with it, as this is only really a problem when:

Processes are waiting for one really slow process
There are no other processes to run