Skip to content

Vectorize any() / all()

Reference issue

DenseBase::any() / DenseBase::all() are not vectorized, even if the dependent functors are. Instead of building out the specialized evaluators for any and all to support this functionality, I instead modified the visitor evaluator to optionally support short-circuit evaluation. If this is not requested, it compiles to a no-op. The any/all visitors are straight forward, and use predux at each step to check if the appropriate condition has been met to break out the loop early.

Benchmarks: 20'007 x 20'007 MatrixXf filled with nonzeros except the bottom-right coefficient -- worst-case scenario for all(). Odd size was used to force a mix of vector and scalar ops. Numbers are similar for any() in an analogous scenario. Time in ms.

Before SSE AVX
233 98 82
228 100 82
230 103 82

The speedup is less than expected due to the relatively expensive predux that occurs each iteration. However, the aggregate speedup for ops that use .any() / all() could be much greater, as vectorization is currently disabled for the entire expression chain. I went ahead and deleted the entire BooleanRedux.h header, and vectorized count() (except for bool) hasnan() allfinite().

Other improvements to the visitors include:

  • enable linear access, which is at-worst the same speed, but often ~5% faster for most cases I have tested. This potentially breaks custom visitors that users have created, though all that is required is LinearAccess = false in the functor traits.
  • tweaked vectorized loops to call a vectorized initialization function. This will break custom visitors with vectorization, but all the user has to do is add an analogous initpacket function which shouldn't be too different from packet
  • vectorized unrolled visitors (both linear and outer-inner traversals)

In general, visitors offer an alternative to the usual unary/binary/ternary expressions and allow the functor to be modified. I think this functionality should be explored in other aspects of Eigen where our usual approach may be limited.

What does this implement/fix?

Additional information

Edited by Charles Schlosser

Merge request reports

Loading