Faster string concatenations. (!493) · Merge requests · FPC / FPC / FPC Source

Rika requested to merge runewalsh/source:concat into main Sep 19, 2023

Existing fpc_ansistr_concat has several small (or not so small) inefficiencies that add up: ..._intern_charmove refuses to inline, PAnsiChar casts (and length calls on strings known to be non-nil) introduce redundant jumps, SetLength with Ref <> 1 is a guaranteedly wasted copying because the entire string is going to get overwritten, and so on. I suspect its purpose is to be reused by the JVM target? Still, since the improvement is nontrivial, I propose to move the existing version to jastrings.inc and use pointers freely.
fpc_ansistr_concat_multi now prefers to reallocate any destination with RefCount = 1 (like fpc_ansistr_concat already did... indirectly, by using SetLength) regardless of its occurences among the inputs. I originally just wanted to save an atomic (that explicit incr_ref(DestS)), but as it turns out you can do quite more...

On my computer, this MR speeds up cases 2 and 3 of tests/bench/bansi1.pp (increased cTimes to 5M) and my own highly synthetic ConcatBenchmark.pas as follows:

x86-64                        before          after
Test 1: 5000000 done in     2.065 sec       2.075 sec
Test 2: 5000000 done in     1.901 sec       1.550 sec
Test 3: 5000000 done in     1.790 sec       1.530 sec
s := 'a' + s ×10:             330 ns/call     219 ns/call
s := 'a' + s + 'a' ×10:       767 ns/call     349 ns/call

i386
Test 1: 5000000 done in     2.032 sec       2.020 sec
Test 2: 5000000 done in     2.919 sec       2.591 sec
Test 3: 5000000 done in     1.940 sec       1.530 sec
s := 'a' + s ×10:             328 ns/call     255 ns/call
s := 'a' + s + 'a' ×10:       864 ns/call     464 ns/call

Which I find rather spectacular! 😎 (And btw, this does not really depend on whether s is one of the inputs or not.)

As an ideological point, concat_multi now should completely match the decisions that concat would make. Before that, for example, concat_multi(b ← [a, b]) was frightened by the non-first b occurence and always created a new string, while concat(b ← [a, b]) knew how to reallocate b.

Possible further tweak is avoiding ReallocMem on small shrinks the same way SetLength does (or just bringing back SetLength, which will slightly complicate concat_multi because it relies on the old length in the header)... but the memory manager is probably already not too bad with such shrinks.

Note that I didn’t touch most of the Length <> 0 checks.

Edited Sep 19, 2023 by Rika

Faster string concatenations.

Merge request reports