-
Changes to the SYCL implementation of the Nbnxm kernel to improve the performance when compiled targetting Nvidia GPU using the ICPX compiler. There are two changes: * Additional loop unrolling * Using the `*` instead of `&&` in an if statement The latest version of the ICPX compiler (2024.1) is required to obtain best performance. With this compiler, a build using oneMKL interface libary DFT backend on A100 saw a 4% speedup in a 100k atom system. When these changes were enabled for AMD GPU (MI210) with a similar build configuration, no change in performance was noted. For an Intel Max 1100 system, there was a 1% performance regression. Consequently, these changes are enabled for Nvidia GPU only.
ebeccd28Changes to the SYCL implementation of the Nbnxm kernel to improve the performance when compiled targetting Nvidia GPU using the ICPX compiler. There are two changes: * Additional loop unrolling * Using the `*` instead of `&&` in an if statement The latest version of the ICPX compiler (2024.1) is required to obtain best performance. With this compiler, a build using oneMKL interface libary DFT backend on A100 saw a 4% speedup in a 100k atom system. When these changes were enabled for AMD GPU (MI210) with a similar build configuration, no change in performance was noted. For an Intel Max 1100 system, there was a 1% performance regression. Consequently, these changes are enabled for Nvidia GPU only.
Loading