Fill* for x64, physically sharing half of the code with FillChar.
I tried to do FillWord / DWord / QWord
for x64
, but since they turned out to be even more trivial modifications of FillChar
than seemed from my preliminary reflections, I considered a good idea to also outline their large common part as procedure FillXxxx_MoreThanTwoXmms
. I’d prefer for it to be a part of FillChar
with everything else jumping in the middle of FillChar
, but is this even possible?.. All of my knowledge of the FPC-flavored assembler is from someone’s code, and I never saw someone doing cross-procedure jumps.
I planned to compare with #32637 (closed), but it looks incomplete. (Considering it dates 2017, I think closing it on accepting my version will not make the author sad, but give him a sigh of relief.)
Benchmark: FillXxxxBenchmark.pas.
My results 👀 .
New Existing
FillChar(2): 2.8 ns/call 2.9 ns/call
FillChar(7): 2.9 ns/call 2.7 ns/call
FillChar(17): 2.7 ns/call 3.1 ns/call
FillChar(50): 3.4 ns/call 3.6 ns/call
FillChar(100): 3.4 ns/call 3.6 ns/call ← just to show it didn’t hurt FillChar
FillChar(500): 8.3 ns/call 8.5 ns/call
FillChar(1000): 16 ns/call 16 ns/call
FillChar(10000): 149 ns/call 149 ns/call
FillChar(100000): 1931 ns/call 1924 ns/call
FillWord(2): 3.2 ns/call 3.9 ns/call
FillWord(7): 3.1 ns/call 4.8 ns/call
FillWord(17): 3.6 ns/call 7.6 ns/call
FillWord(50): 3.8 ns/call 8.3 ns/call
FillWord(100): 4.8 ns/call 11 ns/call ← generic
FillWord(500): 16 ns/call 40 ns/call
FillWord(1000): 31 ns/call 70 ns/call
FillWord(10000): 295 ns/call 601 ns/call
FillWord(100000): 3726 ns/call 6720 ns/call
FillDWord(2): 2.2 ns/call 4.2 ns/call
FillDWord(7): 2.4 ns/call 5.0 ns/call
FillDWord(17): 3.3 ns/call 7.7 ns/call
FillDWord(50): 4.8 ns/call 11 ns/call
FillDWord(100): 7.4 ns/call 17 ns/call ← generic
FillDWord(500): 31 ns/call 71 ns/call
FillDWord(1000): 60 ns/call 129 ns/call
FillDWord(10000): 772 ns/call 1337 ns/call
FillDWord(100000): 8417 ns/call 14306 ns/call
FillQWord(2): 2.4 ns/call 3.7 ns/call
FillQWord(7): 3.5 ns/call 4.5 ns/call
FillQWord(17): 4.0 ns/call 7.7 ns/call
FillQWord(50): 7.4 ns/call 15 ns/call
FillQWord(100): 13 ns/call 32 ns/call ← generic
FillQWord(500): 60 ns/call 126 ns/call
FillQWord(1000): 121 ns/call 246 ns/call
FillQWord(10000): 1534 ns/call 2634 ns/call
FillQWord(100000): 24143 ns/call 28889 ns/call