Skip to content

Draft: Single precision cuFFT for vloc_psi_gpu

Victor Yu requested to merge vwzyu/q-e:cufft_sp into develop

This merge request adds single-precision cuFFT to FFTXlib. As a first application, the FFTs in vloc_psi_gpu can now be done in SP. 214 of the 221 pw test cases in the test-suite are passing with SP FFT. 6 of the failed cases have conv_thr = 1.d-15 defined in the input. They failed because the SCF couldn't converge, which is more or less expected, and actually they converge fine if using the default threshold. The other failed case is vdW-DF3-opt1, which also failed with double-precision FFT on my machine. A system-dependent speedup is achieved without losing much accuracy in total energies and forces. I found that this was already reported in this paper from 2011.

Questions:

  • Support for SP FFT has long been available in the mix-precision branch for a number of CPU backends. Any plan to work on that branch?
  • If the present merge request can be merged, and the idea of using single-precision is liked, we will work on a follow-up merge request that adds support for computing EXX in SP (see #261 and this paper).
  • A command line switch was added to turn on/off the use of SP FFT. Could also be controlled by a keyword.
  • I don't know where to add documentation..

Any feedback appreciated!

Edited by Victor Yu

Merge request reports