f64 support

From !70 (merged)

  • Most of CUDA functions, e.g. cuMemcpy, does not support 64-bit types including f64
  • We have to implement these functionality in this project to support