SSE4.1 IndexQWord for i386 and x86-64.
Attempts to vectorize IndexQWord
on i386
and x86-64
with less drastic means than AVX2 and 256-bit vectors. Seems slower for tiny searches on i386
... :\ (Though is rarely useful there, unlike x86_64
where IndexQWord
is IndexPointer
.)
Benchmark: IndexQWordBenchmark.pas.
My results:
x86-64 SSE4.1 Plain (trunk)
IndexQWord(0 ~ 5 / 10): 3.0 ns/call 2.8 ns/call
IndexQWord(10 ~ 20 / 30): 5.7 ns/call 14 ns/call
IndexQWord(20 ~ 40 / 50): 11 ns/call 22 ns/call
IndexQWord(0 ~ 99 / 100): 17 ns/call 27 ns/call
IndexQWord(0 ~ 999 / 1000): 123 ns/call 159 ns/call
i386
IndexQWord(0 ~ 5 / 10): 7.1 ns/call 3.1 ns/call
IndexQWord(10 ~ 20 / 30): 8.4 ns/call 14 ns/call
IndexQWord(20 ~ 40 / 50): 13 ns/call 22 ns/call
IndexQWord(0 ~ 99 / 100): 20 ns/call 29 ns/call
IndexQWord(0 ~ 999 / 1000): 122 ns/call 156 ns/call
Edited by Rika