The old scheduler was a little over two years old, and due for a
rewrite. While it worked, it was not very efficient and many features
were bolted on top; process pinning being an example.
The new scheduler relies less heavy on locking, only using mutexes
paired with condition variables to wake up sleeping threads. This will
allow it to scale much better as the number of threads goes up.
Another big benefit is clearer code. The old scheduler's code was a
mess, largely because we focused more on getting a proof of concept out
instead of building a scheduler for the next few years.
== Suspending and rescheduling processes
As part of this rewrite, the way timeouts and rescheduling of processes
is handled is also rewritten. When a process is suspended and receives a
message, the sender will try to reschedule it immediately. This makes
sending messages a little bit more expensive, but allows for much faster
rescheduling of processes. This also removes the need for a separate
thread to perform a linear scan over a list of processes to determine
which ones need to be rescheduled.
Processes that suspend themselves with a timeout are stored in a binary
heap, managed by a separate thread. Communication with this thread is
done using a channel, offloading most of the work to the separate
timeout thread. When a process with a timeout is rescheduled, its entry
in the heap is marked as invalid instead of being removed. This makes
the operation a constant time operation, at the cost of the binary heap
getting fragmented. To combat fragmentation, the timeout thread will
periodically remove invalid entries from the heap.
Rescheduling processes is done entirely using atomic operations, instead
of using mutexes. This requires some careful coding to take into account
multiple threads trying to reschedule the same process, but should allow
all of this to scale much better.
The new approach of suspending and rescheduling processes requires one
additional word of memory per process. This memory is used to mark the
process as suspended, and to optionally store a pointer to its timeout
(if one was used).
== Message counts
The number of messages in a mailbox is now stored explicitly using an
atomic integer, instead of obtaining this from the synchronised
data structures internal to a mailbox. This requires one word of extra
memory per process, but makes it much cheaper to check if a process has
messages. This is important, because when rescheduling a process such
checks are performed several times.
== Asynchronous IO and further improvements
While this commit does not add support for asynchronous IO operations,
the rewrite will make it easier to do so in future commits. The process
lookup table also remains unchanged, but we're currently investigating
if we can get rid of PIDs and the lookup table entirely; potentially
speeding up process spawning by quite a bit.