Skip to content

AVX2 CompareByte for i386, sharing branches with SSE2 version.

Rika requested to merge runewalsh/source:cb-i386-avx2 into main

Might even be a tiny bit less of a joke than !391 because there was already a CPU dispatcher, so it has no extra costs other than adding 300 code bytes for CompareByte_AVX2 + 80 for AVX2Support into each application. (But I also shortened the SSE2 version by 50 or so.)

Benchmark: CompareByteI386AVX2Benchmark.pas.

My results:

                                AVX2          SSE2

CompareByte(#0 / 1):         2.0 ns/call   1.8 ns/call
CompareByte(#6 / 7):         2.7 ns/call   2.3 ns/call
CompareByte(#19 / 20):       2.7 ns/call   2.6 ns/call
CompareByte(#39 / 40):       2.9 ns/call   3.3 ns/call
CompareByte(#1 / 100):       2.4 ns/call   2.1 ns/call
CompareByte(#50 / 100):      2.7 ns/call   3.7 ns/call
CompareByte(#99 / 100):      3.5 ns/call   4.9 ns/call
CompareByte(#100 / 200):     3.7 ns/call   4.9 ns/call
CompareByte(#199 / 200):     5.2 ns/call   7.5 ns/call
CompareByte(#999 / 1000):     15 ns/call    27 ns/call
CompareByte(#5000 / 10000):  109 ns/call   138 ns/call
CompareByte(#9999 / 10000):  208 ns/call   264 ns/call
Edited by Rika

Merge request reports