Faster string concatenations.
-
Existing
fpc_ansistr_concathas several small (or not so small) inefficiencies that add up:..._intern_charmoverefuses to inline,PAnsiCharcasts (andlengthcalls on strings known to be non-nil) introduce redundant jumps,SetLengthwithRef <> 1is a guaranteedly wasted copying because the entire string is going to get overwritten, and so on. I suspect its purpose is to be reused by the JVM target? Still, since the improvement is nontrivial, I propose to move the existing version tojastrings.incand use pointers freely. -
fpc_ansistr_concat_multinow prefers to reallocate any destination withRefCount = 1(likefpc_ansistr_concatalready did... indirectly, by usingSetLength) regardless of its occurences among the inputs. I originally just wanted to save an atomic (that explicitincr_ref(DestS)), but as it turns out you can do quite more...
On my computer, this MR speeds up cases 2 and 3 of tests/bench/bansi1.pp (increased cTimes to 5M) and my own highly synthetic ConcatBenchmark.pas as follows:
x86-64 before after
Test 1: 5000000 done in 2.065 sec 2.075 sec
Test 2: 5000000 done in 1.901 sec 1.550 sec
Test 3: 5000000 done in 1.790 sec 1.530 sec
s := 'a' + s ×10: 330 ns/call 219 ns/call
s := 'a' + s + 'a' ×10: 767 ns/call 349 ns/call
i386
Test 1: 5000000 done in 2.032 sec 2.020 sec
Test 2: 5000000 done in 2.919 sec 2.591 sec
Test 3: 5000000 done in 1.940 sec 1.530 sec
s := 'a' + s ×10: 328 ns/call 255 ns/call
s := 'a' + s + 'a' ×10: 864 ns/call 464 ns/call
Which I find rather spectacular! s is one of the inputs or not.)
As an ideological point, concat_multi now should completely match the decisions that concat would make. Before that, for example, concat_multi(b ← [a, b]) was frightened by the non-first b occurence and always created a new string, while concat(b ← [a, b]) knew how to reallocate b.
Possible further tweak is avoiding ReallocMem on small shrinks the same way SetLength does (or just bringing back SetLength, which will slightly complicate concat_multi because it relies on the old length in the header)... but the memory manager is probably already not too bad with such shrinks.
Note that I didn’t touch most of the Length <> 0 checks.