Fix alias violation in BFloat16
What does this implement/fix?
Using a reinterpret_cast to access the bits of a float value is undefined behavior. With GCC 10 on PPC platforms we have seen actual failures (wrong values) due to that which are fixed by (the equivalent) of this change. See https://github.com/easybuilders/easybuild-easyconfigs/pull/14025
An easy testcase for that with TF 2.2.3 is:
import numpy as np
from tensorflow.python import _pywrap_bfloat16
bfloat16 = _pywrap_bfloat16.TF_bfloat16_type()
print(np.arange(-10.5, 7.8, 0.5, dtype=bfloat16))
Which prints [bfloat16(-10.5) bfloat16(-10) bfloat16(-20) bfloat16(-30) bfloat16(-40)...
printf-debugging into the TF bfloat16 shows that during conversion from bfloat16->float the step value gets calculated wrong.
Additional information
Not only is the proposed solution correct, it is even (potentially) faster. See the generated ASM: https://godbolt.org/z/4dT4a9d1b and https://github.com/tensorflow/tensorflow/commit/6b853c8f2020a446d7c04e75deff7866a35a7658#diff-17ca5d26579d2089aa9c41eacf8570b066e5c83dc957dc9bf1647a266de990f1 (see commit message)