Skip to content

Fill* for x64, physically sharing half of the code with FillChar.

Rika requested to merge runewalsh/source:fillxxxx-x64 into main

I tried to do FillWord / DWord / QWord for x64, but since they turned out to be even more trivial modifications of FillChar than seemed from my preliminary reflections, I considered a good idea to also outline their large common part as procedure FillXxxx_MoreThanTwoXmms. I’d prefer for it to be a part of FillChar with everything else jumping in the middle of FillChar, but is this even possible?.. All of my knowledge of the FPC-flavored assembler is from someone’s code, and I never saw someone doing cross-procedure jumps.

I planned to compare with #32637 (closed), but it looks incomplete. (Considering it dates 2017, I think closing it on accepting my version will not make the author sad, but give him a sigh of relief.)

Benchmark: FillXxxxBenchmark.pas.

My results 👀.
                            New             Existing

FillChar(2):            2.8 ns/call        2.9 ns/call
FillChar(7):            2.9 ns/call        2.7 ns/call
FillChar(17):           2.7 ns/call        3.1 ns/call
FillChar(50):           3.4 ns/call        3.6 ns/call
FillChar(100):          3.4 ns/call        3.6 ns/call    ← just to show it didn’t hurt FillChar
FillChar(500):          8.3 ns/call        8.5 ns/call
FillChar(1000):          16 ns/call         16 ns/call
FillChar(10000):        149 ns/call        149 ns/call
FillChar(100000):      1931 ns/call       1924 ns/call

FillWord(2):            3.2 ns/call        3.9 ns/call
FillWord(7):            3.1 ns/call        4.8 ns/call
FillWord(17):           3.6 ns/call        7.6 ns/call
FillWord(50):           3.8 ns/call        8.3 ns/call
FillWord(100):          4.8 ns/call         11 ns/call    ← generic
FillWord(500):           16 ns/call         40 ns/call
FillWord(1000):          31 ns/call         70 ns/call
FillWord(10000):        295 ns/call        601 ns/call
FillWord(100000):      3726 ns/call       6720 ns/call

FillDWord(2):           2.2 ns/call        4.2 ns/call
FillDWord(7):           2.4 ns/call        5.0 ns/call
FillDWord(17):          3.3 ns/call        7.7 ns/call
FillDWord(50):          4.8 ns/call         11 ns/call
FillDWord(100):         7.4 ns/call         17 ns/call    ← generic
FillDWord(500):          31 ns/call         71 ns/call
FillDWord(1000):         60 ns/call        129 ns/call
FillDWord(10000):       772 ns/call       1337 ns/call
FillDWord(100000):     8417 ns/call      14306 ns/call

FillQWord(2):           2.4 ns/call        3.7 ns/call
FillQWord(7):           3.5 ns/call        4.5 ns/call
FillQWord(17):          4.0 ns/call        7.7 ns/call
FillQWord(50):          7.4 ns/call         15 ns/call
FillQWord(100):          13 ns/call         32 ns/call    ← generic
FillQWord(500):          60 ns/call        126 ns/call
FillQWord(1000):        121 ns/call        246 ns/call
FillQWord(10000):      1534 ns/call       2634 ns/call
FillQWord(100000):    24143 ns/call      28889 ns/call
Edited by Rika

Merge request reports