Disable cuda Eigen::half vectorization on host.
All cuda __half
functions are device-only in CUDA 9, including
conversions. Host-side conversions were added in CUDA 10.
The existing code doesn't build prior to 10.0.
All arithmetic functions are always device-only, so there's therefore no reason to use vectorization on the host at all.
Modified the code to disable vectorization for __half
on host,
which required also updating the TensorReductionGpu
implementation
which previously made assumptions about available packets.