Supposedly faster i386 int() and frac().
int
and frac
for i386
that work by bit twiddling (zeroing the mantissa bits after the point) instead of changing the FPU control word.
Not as good as it sounds; my frac(x)
has the same speed in the common case of 1 ≤ abs(x)
≤ probably-something-around-High(int64)
. (fisttp
could make it simpler and faster but turns out to be a SSE3 instruction.
Still, int(x)
seems to be improved in every case, and frac(x)
with abs(x) < 1
can occasionally be no less common; imagine repeated fx += step; ix += int(fx); fx := frac(fx)
with step = 0.001
.
Benchmark and tests against existing implementation in some internal edge cases: IntFracBenchmark.pas. Note that noise2(scale = 1)
is another case that unintentionally falls on the fast path of frac(x)
.
My results:
before (func1) after (func2)
int(cos(i)): 12 ns/call 4.1 ns/call
int(100 * cos(i)): 12 ns/call 7.8 ns/call
int(1e20): 22 ns/call 7.0 ns/call (*)
int(infinity): 86 ns/call 6.9 ns/call (*)
frac(cos(i)): 13 ns/call 6.9 ns/call
frac(100 * cos(i)): 13 ns/call 12 ns/call
frac(1e20): 21 ns/call 4.1 ns/call (*)
frac(infinity): 169 ns/call 4.8 ns/call (*)
noise2(scale = 10): 138 ns/call 128 ns/call
noise2(scale = 1): 141 ns/call 105 ns/call
__________________
(*) Cool numbers but impractical cases so ignore ^^.
Not sure about calling conventions though, i.e. if the input tword always resides at [esp + 4]
haha...
Additionally, do some cosmetic changes to x64
versions. (16-bit pextrw
is SSE2 and zeroes the rest of the destination.)