remove denormal flushing in fp32tobf16 for avx & avx512 (!580) · Merge requests · libeigen / eigen

The source project of this merge request has been removed.

Reference issue

What does this implement/fix?

Flushing denormals inside FP32ToBF16 is consuming too much and making BF16 Eigen ops much slower than FP32 on AVX512 & AVX. This is actually not required here as this should be handled at global level inside the code using using Eigen library. eg: tensorflow sets it correctly when creating a new Eigen thread : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/threadpool.cc#L56

Additional information

With this change, we have seen significant performance increase for models run in BF16.

Edited Aug 03, 2021 by Gauri Deshpande

remove denormal flushing in fp32tobf16 for avx & avx512

Reference issue

What does this implement/fix?

Additional information

Merge request reports