The source project of this merge request has been removed.
Fix calls to device functions from host code
Removes undefined behavior when building with nvcc due to calls to host-only functions from device code. Fixes implemented either by restricting the calling function to the host or by creating device implementations where appropriate.
What does this implement/fix?
Fixes builds of TensorFlow, which can otherwise result in incorrect code when built with CUDA 11.3.