Add VkFFT to hipSYCL HIP backend
This MR enables using VkFFT as a replacement for the rocFFT backend. On MI210, VkFFT substantially outperforms rocFFT for all tested cases. In the case of celluloze
, it executes the FFT operations under 1/3 of the time measured with rocFFT. On MI100 the performance of rocFFT and VkFFT are on pair; however, in the case of celluloze
, the same speed up is observed as for the MI210 case. The reason for this is most likely that VkFFT runtime compiles the FFT kernels, which allows for having optimized code for all cases.
MI210 | SYCL rocFFT | SYCL VkFFT | vs SYCL rocFFT |
---|---|---|---|
adh_dodec | 5,544,096 | 4,212,864 | 75.99% |
stmv | 761,327 | 627,645 | 82.44% |
celluloze | 5384606 | 1,657,404 | 30.78% |
eag | 511124 | 376,766 | 73.71% |
aqp | 890053 | 652,228 | 73.28% |
MI100 | SYCL rocFFT | SYCL VkFFT | vs SYCL rocFFT |
---|---|---|---|
adh_dodec | 3,630,641 | 3,782,660 | 104.19% |
stmv | 688,620 | 591,319 | 85.87% |
celluloze | 5,304,955 | 1,533,474 | 28.91% |
eag | 337,182 | 346,342 | 102.72% |
aqp | 540,204 | 525,845 | 97.34% |
The benchmarks listed were taken from this repository: https://github.com/jychang48/benchmark-gromacs
Refs #4502 (closed)
Edited by Szilárd Páll