Use ERMS in all eligible cases, again.
In !563 (merged) (namely 8310b169), I accidentally TAINTED the logic of choosing the best branch and thus introduced an IMPERFECTION that causes large (>Move_NtThreshold
+ ε = 256 Kb + 16) moves that still don’t qualify for NT (IOW whose distances are smaller than 256 Kb, IOW most cases of deletions from and certain cases of insertions to the beginning of the >256 Kb array) to fall back to the regular loop instead of the ERMS branch. This MR speeds up HugeMoveBenchmark.pas (back) to:
before | after | |
---|---|---|
Move 500,000 bytes by +20,000 | 16.8 mcs/call | 13.2 mcs/call |
Move 500,000 bytes by -20,000 | 16.4 mcs/call | 12.7 mcs/call |
(8310b169 can be simply reverted to the same effect with less LoCs, but its approach is more future-compatible.)
Edited by Rika