Corner cases in SIMD sin/cos
Assigned to Nobody
This is an entry to discuss how to handle corner cases in SIMD sin/cos. Let's call x the input, there are 3 different corner cases:
(p1) x==NaN -> returns NaN
(p2) x==+/-INF -> returns NaN
(p3) x is "large" -> returns ??
There is nothing special about x==NaN as NaN is naturally propagated.
The case x==INF is also naturally handled by the default implementation but preserving this property while handling x=="large" might require special treatment. See below.
Regarding x==large, in 3.3 sin/cos might return numbers outside the [-1:1] range which is not satisfactory. Nonetheless it is important to note that we are talking about values of x such that the distance between two successive floats is larger than 2pi, so basically all you can expect is random numbers. This means there is no need to be too pedantic regarding accuracy, but we might still want to preserve some properties like:
(p4) sin/cos(huge) in [-1:1]
(p5) sin^2(huge) + cos^2(huge) == 1
Those properties are easy to preserve but so far I did not find a way to enforce all of them without a significant overhead:
Time (p2) | (p4) | (p5)
3.3: 0.39 YES NO NO
0f6f75bd (head): 0.41 YES YES NO
MSA-like: 0.46 YES YES YES
clamp to [-pi/4,pi/4]: 0.43 NO YES YES
" + handling of NaN: 0.44 YES YES YES
The last version is implemented by adding the following lines prior to the sin/cos polynomial evaluations:
x = padd(x,psub(_x,_x)); // convert +INF to NaN (borrowed from the MSA implementation)
x = pmin(x,pset1<Packet>( EIGEN_PI/4.f));
x = pmax(x,pset1<Packet>(-EIGEN_PI/4.f));
but it implies a +12% of overhead compared to no special treatment of large inputs.
The MSA-like version is implemented by adding the following lines right after taking the absolute value of the input:
x = padd(x,psub(_x,_x));
Packet small_or_nan_mask = pcmp_lt_or_nan(x, pset1<Packet>(13176795.f));
x = pand(x, small_or_nan_mask);
The overhead is even larger.
I'm not 100% sure what to do on that matter. Opinions welcome.