Use fma<float> for fma<half> and fma<bfloat16> if native fma is not available on the platform.

Thanks to @sandwichmaker for pointing out this corner case: If a*b overflows, but a*b+c is finite, computing a*b+c using standard float32 operations will cause overflow, while fma(a,b,c) will not.

Edited by Rasmus Munk Larsen

Merge request reports

Loading