Optimize scalar_unary_pow_op error handling (!1338) · Merge requests · libeigen / eigen

Reference issue

What does this implement/fix?

Only recently realized the generic pnot is not vector friendly and is not specialized in most platforms. Addressed that and improved error handling with lessons learned from other contributions.

No error handling is required for floating base, integer exponent pow!
Fixed an issue when the base type is an unsigned integer and the exponent is a negative integer.
For integer base, integer exponent operations, the full loop is now performed even if the exponent exceeds the number of digits of the scalar type. Previously, this was a shortcut as overflow is guaranteed unless the base is 0 or 1. However, this doesn't work with unsigned base types as they do not overflow. The number of operations in repeated squaring is logarithmic with respect to the value of the exponent, so the execution time isn't too bad even if an absurdly large exponent is used. This makes the int/int error handling routines simpler as they only handle negative exponents.

Difference in assembly as generated by x86 Clang 12 with AVX2:

handle_nonint_int_errors: eliminated
handle_nonint_nonint_errors: 40 fewer lines (branchess)
handle_int_int (signed): 21 fewer lines
handle_int_int (unsigned): 19 fewer lines

Additional information

Edited May 31, 2023 by Charles Schlosser

Optimize scalar_unary_pow_op error handling

Reference issue

What does this implement/fix?

Additional information

Merge request reports