Skip to content

Removed unnecessary checks for FP16C

What does this implement/fix?

The AVX512 packetmath unnecessarily checks for the presence of FP16C when using the intrinsics _mm512_cvtps_ph and _mm512_cvtph_ps - these are AVX512F intrinsics, and do not need this flag to be set. Currently, if -mfp16c is not set, a scalar typecast is used for float2half and half2float.

Additional information

Checking various versions of GCC, clang, MSVC, all seem to compile the intrinsics fine with only AVX512F enabled. This makes a massive performance difference if someone has set -mavx512f but not -mfp16c, as it avoids very slow scalar typecasts.

Merge request reports

Loading