Add AVX vector path to float2half/half2float (!702) · Merge requests · libeigen / eigen

Add AVX vector path to float2half/half2float

Makes e. g. matrix multiplication 3x faster: name old cpu/op new cpu/op delta BM_convers 181ms ± 1% 62ms ± 9% -65.82% (p=0.016 n=4+5)

Direct translation of the scalar code from half_to_float and float_to_half_rtne (Eigen/src/Core/arch/Default/Half.h). Tested on all possible input values (not adding those, since they take a long time, especially in debug build).

Add AVX vector path to float2half/half2float

Merge request reports