Skip to content

[HEAP.INC TIER 2 ALT. EVO] Change RbTree to a FastMM-like two-level bitfield.

Changes RbTree *sigh* to an algorithm similar to the one from FastMM4 and its clones.

Main change from FastMM4 is progressive quantization of the medium sizes. I’ve sort of reinvented the FastMM5 approach with this one, but FastMM5 uses 3 size classes (256, 512, and 1024-byte steps; FastMM4 uses only 256), while I use steps from 32 = 2⁵ to 16.384 = 2¹⁴ which should give a smooth transition from the fixed sizes (smoother than the right end of the current fixed sizes sequence :D).

Advantages: it is much faster (I consider it a dubious advantage because they were already “fast enough”, e.g. for me, allocating 1.5 Kb was as fast as FillChar’ing them, allocating 15 Kb was 10× faster than FillChar, and so on) and simpler (which I find more important, as this code is included into each application).

Disadvantages: it uses 320 thread-local pointers and gives up to +3% of the memory overhead. These two numbers can be balanced differently, e.g. you can have 640 pointers and up to +1.5% of the memory overhead, or 160 pointers and up to +6%, or a more different approach with different behavior; 320 & +3% is the balance point I’ve chosen. Also, it introduces a hard limit on the medium chunk sizes; larger chunks are bound to be “huge”. RbTree de-facto used a limit, too, but fundamentally supported any sizes. Presently the limit is around 1023.5 Kb, which is only reachable by customizing GrowHeapSize2 to 4+ Mb (default is 1 Mb), so in practice you can reduce the amount of pointers to 256 (each 32 pointers double the limit) and nothing will change with default GrowHeapSize2.

Win64 code size: −1.3 Kb (now only 320 bytes larger than oldheap.inc).

Randomized benchmark: MediumBenchmark.pas. My results:

                         trunk              MR        mormot.core.fpcx64mm   oldheap.inc 💀
GetMem + FreeMem:     358 ns/call       183 ns/call       158 ns/call         22 µs/call
heap used / size:   260.1 / 268.8 Mb  261.7 / 271.3 Mb     <270.5 Mb>      258.8 / 308.6 Mb

FPC compiling itself:

                             trunk                      MR
AllocVar              37.8 ms, 189 ns/call      21.6 ms, 108 ns/call
FreeVar               28.3 ms, 142 ns/call      16.5 ms,  83 ns/call
MaxHeapUsed/Size        259.9 / 264.2 Mb          260.3 / 264.1 Mb
Edited by Rika

Merge request reports

Loading