Add vectorization of sqrt for doubles
Submitted by panda-34
Assigned to Nobody
Link to original bugzilla bug (#642)
Version: 3.2
Platform: x86 - SSE
Description
Currently SSE module has a packet sqrt function only for Packet4f, not for Packet2d, so any double-type coefficient-wise expression containing sqrt has to be calculated using single-data commands. I suggest you add
template<> EIGEN_STRONG_INLINE Packet2d psqrt<Packet2d>(const Packet2d& a) { return _mm_sqrt_pd(a); }
so that sqrt can be vectorized. Also, it is my understanding that the algorithm used to calculate sqrt in the current implementation of psqrt<Packet4f> (via inverse sqrt) has reduced accuracy, so maybe use it only under EIGEN_FAST_MATH flag, letting the user use the full accuracy of _mm_sqrt_ps otherwise?
Blocking
Edited by Eigen Bugzilla