Skip to content

Use XMMs in x64 Move.

Rika requested to merge runewalsh/source:move-x64 into main

For me, my Move is better both for small and large cases:

  • Small cases (≤32 bytes) are handled by selecting the appropriate branch and doing two unaligned reads + two writes. Considerably faster; people whose crucial structures cross page boundaries might disagree. (I tried also 4 reads + 4 writes for 32 < size ≤ 64, it looked a lot better in the sense of taking a lot less jumps but the resulting speedup was a bit dubious, like 3.0 → 2.0 ns.)

  • Large cases use XMM transfers. Maybe original author did not use them not for irrational reasons but because MOVDQU and even MOVDQA were worse than equivalent two 8-byte transfers for him, but logically, and on my computer, XMM transfers are better.

Benchmark: MoveBenchmark.pas.

My results🍍
                            New            Existing

Move(1~8):              2.0 ns/call      3.3 ns/call
Move(10~30):            1.4 ns/call      4.1 ns/call
Move(20~100):           2.8 ns/call      6.0 ns/call
Move(50~300):           7.1 ns/call       10 ns/call
Move(100~1000):          18 ns/call       25 ns/call
Move(1000~10000):       167 ns/call      231 ns/call
Move(10000~100000):    1605 ns/call     2307 ns/call

Merge request reports