Consider moving away from a linked list in BlockList
The BlockList type is a linked list, storing the "next" blocks in the Immix blocks' headers. Since this space is already reserved, this doesn't require additional space. Unfortunately, iterating over a linked list is not very efficient, and in some of my testing turns out to be at least twice as slow compared to iterating over a vector. Worse, we can't parallelize this very well as we need to iterate the entire linked list before we can split it. I was hoping Rayon had some clever way of scheduling work while iterating over the list, but I have been unable to find any information on the matter.
Using a Vec would speed up iteration, and allow us to parallelise some parts of preparing and reclaiming collections better. The downside is that we can not iterate a Vec while another thread is concurrently adding a new block (e.g. when garbage collecting), as this may result in the Vec being resized. A chunked list would provide a balance, but may not be as efficient to be worth the effort.
Some pros and cons and notes for both approaches that I took:
Using a linked list for blocks makes it easy to add new blocks, without having to copy. We also don't have the memory bloat that you may get when using a Vec with a capacity greater than its number of values.
Iterating a linked list is about twice as slow as iterating over a Vec, and its difficult to perform this in parallel efficiently because of that. A Vec can be split easily, and Rayon actually speeds up the GC when doing this; instead of slowing it down when using a linked list.
The Vec bloat is something we can handle by compacting it or replacing it with a new Vec at the end of a GC. This does mean some time may be spent deallocating the Vec if it is large.
Linked List:
👍 No additional memory required per block, since we reuse the block header👍 Adding blocks is fast, and never requires resizing and copying👍 Bulk adding a list is easy: just chain the two together👎 Traversal of around 3000 nodes takes around 600 microseconds👎 We need to implement a linked list👎 Can not be iterated in parallel efficiently👎 Not very cache friendly, as the blocks can be all over the placeVec:
👍 Requires resizing when adding blocks, if the capacity is not great enough👎 Bulk adding requires copying and resizing values👎 Finding the next block in the list requires that we store an index, instead of just using the "next" pointer. We could keep the "next" pointers, but this then becomes a bit redundant.👎 Can't iterate the Vec while concurrently adding a new block👍 Traversal of around 3000 nodes takes around 300 microseconds👍 We can drop most of the BlockList code, if not everything of it👍 Parallel iteration with Rayon is easy👍 Significantly improves GC timings of the /tmp/gc.inko test (by about 2.5x)👍 Probably much more cache friendly👍 Vector bloat can be dealt with by replacing/compacting them after a GCA 1 GB heap would produce 1 MB worth of block pointers. This overhead is minor for a 1 GB heap.