Skip to content

Supposedly faster i386 int() and frac().

Rika requested to merge runewalsh/source:if into main

int and frac for i386 that work by bit twiddling (zeroing the mantissa bits after the point) instead of changing the FPU control word.

Not as good as it sounds; my frac(x) has the same speed in the common case of 1 ≤ abs(x) ≤ probably-something-around-High(int64). (fisttp could make it simpler and faster but turns out to be a SSE3 instruction. 😑)

Still, int(x) seems to be improved in every case, and frac(x) with abs(x) < 1 can occasionally be no less common; imagine repeated fx += step; ix += int(fx); fx := frac(fx) with step = 0.001.

Benchmark and tests against existing implementation in some internal edge cases: IntFracBenchmark.pas. Note that noise2(scale = 1) is another case that unintentionally falls on the fast path of frac(x).

My results:

                      before (func1)   after (func2)
	   
int(cos(i)):            12 ns/call      4.1 ns/call
int(100 * cos(i)):      12 ns/call      7.8 ns/call
int(1e20):              22 ns/call      7.0 ns/call (*)
int(infinity):          86 ns/call      6.9 ns/call (*)

frac(cos(i)):           13 ns/call      6.9 ns/call
frac(100 * cos(i)):     13 ns/call       12 ns/call
frac(1e20):             21 ns/call      4.1 ns/call (*)
frac(infinity):        169 ns/call      4.8 ns/call (*)

noise2(scale = 10):    138 ns/call      128 ns/call
noise2(scale = 1):     141 ns/call      105 ns/call

__________________
(*) Cool numbers but impractical cases so ignore ^^.

Not sure about calling conventions though, i.e. if the input tword always resides at [esp + 4] haha...


Additionally, do some cosmetic changes to x64 versions. (16-bit pextrw is SSE2 and zeroes the rest of the destination.)

Edited by Rika

Merge request reports