clean up intel packet reductions
Reference issue
What does this implement/fix?
While working on maxCoeff / minCoeff, I noticed there are a bunch of missing predux ops. This MR reorganizes the predux operations for Intel intrinsics into separate files and adds a few that were missing (namely, predux_min/max with NaN propagation). I also substituted in the AVX512 built-in floating point reductions.
Additional information
Edited by Charles Schlosser