Useless cclasses.TFPHashList revamp
In the last few years, the following hash table implementation became somewhat popular among runtime libraries: open-addressed array of indices that reference dense array of items.
indices = [2, -, −, 0, 3, −, T, 1]
items = [item0, item1, item2, item3]
This saves memory, as sizeof(item) > sizeof(index)
. Even if sizeof(index)
is at its natural limit, sizeof(SizeUint)
, current cutils.THashItem
worths three to four indices. Moreover, by adding a level of indirection index type can be made switchable, so that hash table with 256−2 elements (−2 are representations of empty cell −
and tombstone T
) could accomplish everything it needs with 1-byte indices, etc. It turns out that the gain just from using less memory fully compensates for the initial indirection cost, and then you can get yourself halfway closer back to the previous memory usage but have sparser table with less collisions.
Doing the same with cutils.TFPHashList
is tricky because of different semantics regarding duplicates or removes on which other parts of the compiler have relied on (see e.g. comments on #23501 (closed)). Still, I tried to adapt that indices thing while fully preserving said semantics, and I have an impression that it works slightly but measurably faster, speeding up the compilation by 0,5% or so. But this might be false and I didn’t test it much (in particular if it works in symansistr
mode at all), just tried to compile the compiler itself and my application for x86-64
.
I also have changed the way how these strange permanent storages for strings attached to every TFPHashList
work: instead of a single moveable block, they are allocated using non-moveable memory regions. For example, initially such a region contains (say) one 64-byte chunk. When it becomes too full to fit the next allocation request, another chunk of (say) 96 bytes is allocated and linked to it, and so on. This allows for hash records and TFPHashObject
to conveniently store direct pointers to shortstrings instead of indices to FStrs
, as their data will never be moved, only destroyed with the hash table itself.
Hypothetically, this can waste memory; imagine two allocations of 48 bytes, first comes into the first 64-byte chunk, and second allocates a new one, forever locking remaining 64−48=16 tail bytes of the first, as for the sake of speed and simplicity memory region will never consider them anymore (though it could). But in practice this problem does not exist: when compiling my application, max hash table size reaches three or four megabytes (not counting its string pool but counting everything else; string pool is a separate three or four megabytes, too). Meanwhile, its memory region wastes on these tails (not counting its last, active chunk) around 120 bytes. Of course, this is vastly less than potential amount of garbage from the fact that strings can be deleted from the hash table but are never deleted from the pool (I think I have seen around 30 Kb), or from the fact that reallocating a memory chunk requires a free space that is a (100+k)% multiplier of its size.
What is slightly less useless, while working on it I also found a minor bug and a suspicious place in the code.
Minor bug is that in scanner.pas
, hash table of ignoredirectives
is supposed to prevent repeated warnings about unknown compiler directives, but such directives are remembered by associating nil
pointers with them. Even if this was later checked “correctly” with ignoredirectives.FindIndexOf(hs)>=0
instead of ignoredirectives.find(hs)<>nil
, TFPHashList
explicitly treats elements with associated nil
s as non-existing and will return −1
even from FindIndexOf
. I have replaced nil
s with pointer(1)
.
Suspicious place is in ogbase.pas
here. (F)ExeSectionList
is not just a TFPHashList
of pointers but TFPHashObjectList
of objects, and is constructed with OwnsObjects=True
. The thing is, assigning an item of TFPHashObjectList
that OwnsObjects
frees the old item. Right after this FExeSectionList
is freed itself, so either this nil
ifying loop can be removed (if freeing old FExeSectionList
contents is the desirable behaviour) or must be replaced with FExeSectionList.OwnsObjects:=false
(it this is an error).