Vectorize any() / all()
Reference issue
DenseBase::any() / DenseBase::all() are not vectorized, even if the dependent functors are. Instead of building out the specialized evaluators for any and all to support this functionality, I instead modified the visitor evaluator to optionally support short-circuit evaluation. If this is not requested, it compiles to a no-op. The any/all visitors are straight forward, and use predux at each step to check if the appropriate condition has been met to break out the loop early.
Benchmarks: 20'007 x 20'007 MatrixXf filled with nonzeros except the bottom-right coefficient -- worst-case scenario for all(). Odd size was used to force a mix of vector and scalar ops. Numbers are similar for any() in an analogous scenario. Time in ms.
| Before | SSE | AVX |
|---|---|---|
| 233 | 98 | 82 |
| 228 | 100 | 82 |
| 230 | 103 | 82 |
The speedup is less than expected due to the relatively expensive predux that occurs each iteration. However, the aggregate speedup for ops that use .any() / all() could be much greater, as vectorization is currently disabled for the entire expression chain. I went ahead and deleted the entire BooleanRedux.h header, and vectorized count() (except for bool) hasnan() allfinite().
Other improvements to the visitors include:
- enable linear access, which is at-worst the same speed, but often ~5% faster for most cases I have tested. This potentially breaks custom visitors that users have created, though all that is required is
LinearAccess = falsein the functor traits. - tweaked vectorized loops to call a vectorized initialization function. This will break custom visitors with vectorization, but all the user has to do is add an analogous
initpacketfunction which shouldn't be too different frompacket - vectorized unrolled visitors (both linear and outer-inner traversals)
In general, visitors offer an alternative to the usual unary/binary/ternary expressions and allow the functor to be modified. I think this functionality should be explored in other aspects of Eigen where our usual approach may be limited.