SSE2 IndexDWord for x64.
Something clever could definitely be done or adapted from existing clever implementations for bytes and words, but such things require an unbearable amount of thinking activity, so I did a dull MOVDQU
loop.
Benchmark: IndexDWordBenchmark.pas.
My results:
System.IndexDWord
#0 / 1: 2.0 ns/call
#3 / 4: 2.8 ns/call
#6 / 7: 4.1 ns/call
#31 / 32: 9.9 ns/call
#99 / 100: 39 ns/call
#999 / 1000: 257 ns/call
IndexDWordAsm
#0 / 1: 2.4 ns/call
#3 / 4: 2.2 ns/call
#6 / 7: 3.6 ns/call
#31 / 32: 4.4 ns/call
#99 / 100: 11 ns/call
#999 / 1000: 101 ns/call