The source project of this merge request has been removed.
Overload Hexagon float pmadd/pmsub/pnmadd/pnmsub to avoid slow fmaf function calls.
Hexagon compiler fails to inline fmaf function call so suffers big function call overheads for pmadd on scalar code. The fmaf intrinsics also cannot be used because of Hexagon compiler bug (https://support.qualcomm.com/500dK000006M9Iy). This CL overloads pmadd to use multiply and add to improve the performance. It should only impact code generation for Hexagon.