[HEAP.INC TIER 2 ALT. EVO] Change RbTree to a FastMM-like two-level bitfield.
Changes RbTree
*sigh* to an algorithm similar to the one from FastMM4
and its clones.
Main change from FastMM4
is progressive quantization of the medium sizes. I’ve sort of reinvented the FastMM5
approach with this one, but FastMM5
uses 3 size classes (256, 512, and 1024-byte steps; FastMM4
uses only 256), while I use steps from 32 = 2⁵ to 16.384 = 2¹⁴ which should give a smooth transition from the fixed sizes (smoother than the right end of the current fixed sizes sequence :D).
Advantages: it is much faster (I consider it a dubious advantage because they were already “fast enough”, e.g. for me, allocating 1.5 Kb was as fast as FillChar
’ing them, allocating 15 Kb was 10× faster than FillChar
, and so on) and simpler (which I find more important, as this code is included into each application).
Disadvantages: it uses 320 thread-local pointers and gives up to +3% of the memory overhead. These two numbers can be balanced differently, e.g. you can have 640 pointers and up to +1.5% of the memory overhead, or 160 pointers and up to +6%, or a more different approach with different behavior; 320 & +3% is the balance point I’ve chosen. Also, it introduces a hard limit on the medium chunk sizes; larger chunks are bound to be “huge”. RbTree
de-facto used a limit, too, but fundamentally supported any sizes. Presently the limit is around 1023.5 Kb, which is only reachable by customizing GrowHeapSize2
to 4+ Mb (default is 1 Mb), so in practice you can reduce the amount of pointers to 256 (each 32 pointers double the limit) and nothing will change with default GrowHeapSize2
.
Win64 code size: −1.3 Kb (now only 320 bytes larger than oldheap.inc
).
Randomized benchmark: MediumBenchmark.pas. My results:
trunk MR mormot.core.fpcx64mm oldheap.inc 💀
GetMem + FreeMem: 358 ns/call 183 ns/call 158 ns/call 22 µs/call
heap used / size: 260.1 / 268.8 Mb 261.7 / 271.3 Mb <270.5 Mb> 258.8 / 308.6 Mb
FPC compiling itself:
trunk MR
AllocVar 37.8 ms, 189 ns/call 21.6 ms, 108 ns/call
FreeVar 28.3 ms, 142 ns/call 16.5 ms, 83 ns/call
MaxHeapUsed/Size 259.9 / 264.2 Mb 260.3 / 264.1 Mb