Faster string concatenations.
-
Existing
fpc_ansistr_concat
has several small (or not so small) inefficiencies that add up:..._intern_charmove
refuses to inline,PAnsiChar
casts (andlength
calls on strings known to be non-nil
) introduce redundant jumps,SetLength
withRef <> 1
is a guaranteedly wasted copying because the entire string is going to get overwritten, and so on. I suspect its purpose is to be reused by the JVM target? Still, since the improvement is nontrivial, I propose to move the existing version tojastrings.inc
and use pointers freely. -
fpc_ansistr_concat_multi
now prefers to reallocate any destination withRefCount = 1
(likefpc_ansistr_concat
already did... indirectly, by usingSetLength
) regardless of its occurences among the inputs. I originally just wanted to save an atomic (that explicitincr_ref(DestS)
), but as it turns out you can do quite more...
On my computer, this MR speeds up cases 2 and 3 of tests/bench/bansi1.pp
(increased cTimes
to 5M) and my own highly synthetic ConcatBenchmark.pas as follows:
x86-64 before after
Test 1: 5000000 done in 2.065 sec 2.075 sec
Test 2: 5000000 done in 1.901 sec 1.550 sec
Test 3: 5000000 done in 1.790 sec 1.530 sec
s := 'a' + s ×10: 330 ns/call 219 ns/call
s := 'a' + s + 'a' ×10: 767 ns/call 349 ns/call
i386
Test 1: 5000000 done in 2.032 sec 2.020 sec
Test 2: 5000000 done in 2.919 sec 2.591 sec
Test 3: 5000000 done in 1.940 sec 1.530 sec
s := 'a' + s ×10: 328 ns/call 255 ns/call
s := 'a' + s + 'a' ×10: 864 ns/call 464 ns/call
Which I find rather spectacular! s
is one of the inputs or not.)
As an ideological point, concat_multi
now should completely match the decisions that concat
would make. Before that, for example, concat_multi(b ← [a, b])
was frightened by the non-first b
occurence and always created a new string, while concat(b ← [a, b])
knew how to reallocate b
.
Possible further tweak is avoiding ReallocMem
on small shrinks the same way SetLength
does (or just bringing back SetLength
, which will slightly complicate concat_multi
because it relies on the old length in the header)... but the memory manager is probably already not too bad with such shrinks.
Note that I didn’t touch most of the Length <> 0
checks.