Use reinterpret_cast on GPU for bit_cast. (!677) · Merge requests · libeigen / eigen

This seems to be the recommended approach for doing type punning in CUDA. See for example

https://stackoverflow.com/questions/47037104/cuda-type-punning-memcpy-vs-ub-union
https://developer.nvidia.com/blog/faster-parallel-reductions-kepler/

(the latter puns a double to an int2). The issue is that for CUDA, the memcpy is not elided, and ends up being an expensive operation. We already have similar reintepret_casts across the Eigen codebase for GPU (as does TensorFlow).

Use reinterpret_cast on GPU for bit_cast.

Merge request reports