optimise pow(x,2) -> square(x) and pow(x,0.5) -> sqrt(x)
TODO: optimise pow(x,2) to square(x) and pow(x,0.5) to sqrt(x)
For reference, Armadillo implementation: https://gitlab.com/conradsnicta/armadillo-code/-/blob/15.0.x/include/armadillo_bits/Mat_meat.hpp#L5186-5213