The source project of this merge request has been removed.
remove denormal flushing in fp32tobf16 for avx & avx512
Reference issue
What does this implement/fix?
Flushing denormals inside FP32ToBF16 is consuming too much and making BF16 Eigen ops much slower than FP32 on AVX512 & AVX. This is actually not required here as this should be handled at global level inside the code using using Eigen library. eg: tensorflow sets it correctly when creating a new Eigen thread : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/threadpool.cc#L56Additional information
With this change, we have seen significant performance increase for models run in BF16.Edited by Gauri Deshpande