Skip to content

Unroll IndexDWords by 2×.

Rika requested to merge runewalsh/source:idw into main

Slightly speeds up IndexDWords by por(pcmpeq(A, Pattern), pcmpeq(A + 16, Pattern))-style unrolling.

IndexDWordBenchmark.pas

i386                              New          Trunk
IndexDWord(0 ~ 5 / 10):       3.5 ns/call   2.9 ns/call
IndexDWord(10 ~ 20 / 30):     5.2 ns/call   7.4 ns/call
IndexDWord(20 ~ 40 / 50):     8.5 ns/call    12 ns/call
IndexDWord(0 ~ 99 / 100):      13 ns/call    17 ns/call
IndexDWord(0 ~ 999 / 1000):    69 ns/call    81 ns/call
IndexDWord(0 ~ 4999 / 5000):  404 ns/call   471 ns/call

x86-64
IndexDWord(0 ~ 5 / 10):       3.2 ns/call   2.9 ns/call
IndexDWord(10 ~ 20 / 30):     5.2 ns/call   7.2 ns/call
IndexDWord(20 ~ 40 / 50):     9.1 ns/call    12 ns/call
IndexDWord(0 ~ 99 / 100):      14 ns/call    18 ns/call
IndexDWord(0 ~ 999 / 1000):    69 ns/call    91 ns/call
IndexDWord(0 ~ 4999 / 5000):  402 ns/call   493 ns/call
Edited by Rika

Merge request reports