SIMD Math (x64).
Vectorized implementations of most Math
array functions. Ones that return float
s are enabled only if sizeof(float) = 8
(so only on Windows I guess?); can be enabled everywhere at the cost of 10 bits of precision...
(Actually, I’d prefer the very reverse: callback versions like Sum(getValue: function(param: pointer): float)
. Now that we have anonymous functions... But this is pure fantasy while the array interface is already there.)
Benchmark: ArrayMathBenchmark.pas.
My results:
before after
Sum(single x 10000): 10 us/call 2.2 us/call
Sum(double x 10000): 6.0 us/call 1.7 us/call
SumOfSquares(single x 10000): 8.1 us/call 2.1 us/call
SumOfSquares(double x 10000): 7.8 us/call 2.0 us/call
SumsAndSquares(single x 10000): 13 us/call 2.8 us/call
SumsAndSquares(double x 10000): 9.5 us/call 2.5 us/call
MinValue(single x 10000): 9.7 us/call 2.5 us/call
MinValue(double x 10000): 9.7 us/call 4.9 us/call
MaxValue(single x 10000): 9.6 us/call 2.5 us/call
MaxValue(double x 10000): 9.8 us/call 4.9 us/call
MinValue(int32 x 10000): 4.9 us/call 1.9 us/call
MaxValue(int32 x 10000): 4.9 us/call 1.9 us/call
SumInt(int32 x 10000): 3.1 us/call 1.6 us/call
SumInt(int64 x 10000): 3.3 us/call 1.4 us/call