Unroll IndexDWords by 2×.
Slightly speeds up IndexDWord
s by por(pcmpeq(A, Pattern), pcmpeq(A + 16, Pattern))
-style unrolling.
i386 New Trunk
IndexDWord(0 ~ 5 / 10): 3.5 ns/call 2.9 ns/call
IndexDWord(10 ~ 20 / 30): 5.2 ns/call 7.4 ns/call
IndexDWord(20 ~ 40 / 50): 8.5 ns/call 12 ns/call
IndexDWord(0 ~ 99 / 100): 13 ns/call 17 ns/call
IndexDWord(0 ~ 999 / 1000): 69 ns/call 81 ns/call
IndexDWord(0 ~ 4999 / 5000): 404 ns/call 471 ns/call
x86-64
IndexDWord(0 ~ 5 / 10): 3.2 ns/call 2.9 ns/call
IndexDWord(10 ~ 20 / 30): 5.2 ns/call 7.2 ns/call
IndexDWord(20 ~ 40 / 50): 9.1 ns/call 12 ns/call
IndexDWord(0 ~ 99 / 100): 14 ns/call 18 ns/call
IndexDWord(0 ~ 999 / 1000): 69 ns/call 91 ns/call
IndexDWord(0 ~ 4999 / 5000): 402 ns/call 493 ns/call
Edited by Rika