Vectorize .reverse() and .reverseInPlace()
@chhtz
Submitted by Christoph HertzbergAssigned to Nobody
Link to original bugzilla bug (#1685)
Version: 3.4 (development)
Platform: x86 - AVX
Description
While exposing a bug in LLVM, Bug #1684 (closed) also demonstrated that many reverse operations could be nicely vectorized, especially with AVX or AVX512.
E.g., a rowwise vectorization of a Matrix4f with AVX would be possible using two vperm2f128 instructions with AVX or one vpermps instruction with AVX512F.
Similar optimizations are possible with SSE for matrices with 2 rows or (slightly more complicated) 4*n+2 rows.
Low priority/JuniorJob.
If someone wants to try this, but does not want to dig into the internals of Eigen, provide solutions like:
template<int rows, int cols>
void rowwise_reverse(float* out, float const* in);
template<int rows, int cols>
void colwise_reverseInPlace(float* in_out);
(Or with a templated Scalar instead of float)
It is also ok to provide solutions just for some cases at first (but avoid writing too much boiler plate code).