Specialized IndexQWord for i386.
Specialized i386
IndexQWord
, assuming qword
is always passed like that and a function is allowed to overwrite its stack arguments...
Benchmark: IndexQWordBenchmark.pas.
My results:
New Generic
IndexQWord(#0 .. #1): 3.2 ns/call 4.6 ns/call
IndexQWord(#0 .. #2): 3.2 ns/call 4.7 ns/call
IndexQWord(#0 .. #3): 3.3 ns/call 4.9 ns/call
IndexQWord(#0 .. #4): 3.4 ns/call 5.1 ns/call
IndexQWord(#0 .. #14): 4.9 ns/call 7.6 ns/call
IndexQWord(#0 .. #15): 5.0 ns/call 7.9 ns/call
IndexQWord(#0 .. #16): 5.3 ns/call 8.4 ns/call
IndexQWord(#0 .. #49): 12 ns/call 20 ns/call
IndexQWord(#0 .. #99): 25 ns/call 36 ns/call
IndexQWord(#0 .. #999): 134 ns/call 249 ns/call
I cannot make it not be faster on my computer even by rewriting the generic version so that it generates exactly the same code for the main loop body, so this might be some sort of artifact, but preparations are simpler as well.
Edited by Rika