SYCL listed forces: optimize parameter passing
Compared to 2023.0, using ADH cubic. Speed-up of "Bonded" kernels:
F-only | FV | |
---|---|---|
hipSYCL 0.9.4, ROCm 5.2.5, bundled Clang, gfx90a | 2.67 | 2.33 |
hipSYCL 0.9.4, ROCm 5.4.1, Clang 15.0.7, gfx1034 | 1.94 | 1.56 |
hipSYCL 0.9.4, CUDA 11.8, Clang 15.0.7, sm_86 | 1.00 | 1.00 |
IntelLLVM nightly 2023-02-06, CUDA 11.8, sm_86 | 4.09 | 1.55 |
oneAPI 2023.0, Intel Arc 770 | 1.29 | 1.01 |
oneAPI 2023.0, Ponte Vecchio | 1.10 | ~1 |
Based on AMD/StreamHPC optimization.
Refs #3928 (closed), #4593 (closed)
Edited by Andrey Alekseenko