Skip to content

SSE4.1 IndexQWord for i386 and x86-64.

Rika requested to merge runewalsh/source:iq-sse41 into main

Attempts to vectorize IndexQWord on i386 and x86-64 with less drastic means than AVX2 and 256-bit vectors. Seems slower for tiny searches on i386... :\ (Though is rarely useful there, unlike x86_64 where IndexQWord is IndexPointer.)

Benchmark: IndexQWordBenchmark.pas.

My results:

x86-64                         SSE4.1      Plain (trunk)
IndexQWord(0 ~ 5 / 10):      3.0 ns/call    2.8 ns/call
IndexQWord(10 ~ 20 / 30):    5.7 ns/call     14 ns/call
IndexQWord(20 ~ 40 / 50):     11 ns/call     22 ns/call
IndexQWord(0 ~ 99 / 100):     17 ns/call     27 ns/call
IndexQWord(0 ~ 999 / 1000):  123 ns/call    159 ns/call

i386
IndexQWord(0 ~ 5 / 10):      7.1 ns/call    3.1 ns/call
IndexQWord(10 ~ 20 / 30):    8.4 ns/call     14 ns/call
IndexQWord(20 ~ 40 / 50):     13 ns/call     22 ns/call
IndexQWord(0 ~ 99 / 100):     20 ns/call     29 ns/call
IndexQWord(0 ~ 999 / 1000):  122 ns/call    156 ns/call
Edited by Rika

Merge request reports