Use reinterpret_cast on GPU for bit_cast.

This seems to be the recommended approach for doing type punning in CUDA. See for example

(the latter puns a double to an int2). The issue is that for CUDA, the memcpy is not elided, and ends up being an expensive operation. We already have similar reintepret_casts across the Eigen codebase for GPU (as does TensorFlow).

Merge request reports

Loading