Skip to content

optimize predux if architecture is aarch64

What does this implement/fix?

This PR is going to optimize predux, predux_min and predux_max.

When NEON is in aarch64, we can use v(add|min|max)v intrinsic to do reduction, because use a lot of vp(add|min|max) will slow the performance.

Merge request reports

Loading