Unroll middle jm loop in the nbnxm kernels on Ampere
The unrolling improves performance of the non-bonded kernels by up to 12%.
Note: cherry-picked backport, skip when merging.
Refs #3873 (closed)
The unrolling improves performance of the non-bonded kernels by up to 12%.
Note: cherry-picked backport, skip when merging.
Refs #3873 (closed)