Skip to content
  • HJA Bird's avatar
    ebeccd28
    SYCL: Optimizations to nbnxm kernel for Nvidia - unroll and &&->* · ebeccd28
    HJA Bird authored and Magnus Lundborg's avatar Magnus Lundborg committed
    Changes to the SYCL implementation of the Nbnxm kernel to improve the performance when compiled targetting Nvidia GPU using the ICPX compiler.
    
    There are two changes:
    * Additional loop unrolling 
    * Using the `*` instead of `&&` in an if statement
    
    The latest version of the ICPX compiler (2024.1) is required to obtain best performance. With this compiler, a build using oneMKL interface libary DFT backend on A100 saw a 4% speedup in a 100k atom system. When these changes were enabled for AMD GPU (MI210) with a similar build configuration, no change in performance was noted. For an Intel Max 1100 system, there was a 1% performance regression.
    
    Consequently, these changes are enabled for Nvidia GPU only.
    ebeccd28
    SYCL: Optimizations to nbnxm kernel for Nvidia - unroll and &&->*
    HJA Bird authored and Magnus Lundborg's avatar Magnus Lundborg committed
    Changes to the SYCL implementation of the Nbnxm kernel to improve the performance when compiled targetting Nvidia GPU using the ICPX compiler.
    
    There are two changes:
    * Additional loop unrolling 
    * Using the `*` instead of `&&` in an if statement
    
    The latest version of the ICPX compiler (2024.1) is required to obtain best performance. With this compiler, a build using oneMKL interface libary DFT backend on A100 saw a 4% speedup in a 100k atom system. When these changes were enabled for AMD GPU (MI210) with a similar build configuration, no change in performance was noted. For an Intel Max 1100 system, there was a 1% performance regression.
    
    Consequently, these changes are enabled for Nvidia GPU only.
Loading