tweak default nonbonded CUDA kernels selection
It has been observed that the default choice of kernel flavors is not always optimal:
- tabulated Ewald correction faster in the energy+force kernels (on Turing, Volta, Ampere, and most cases on Pascal too) and force-only kernels (Volta, Ampere)
- not using textures can be faster (Volta and Ampere)
Edited by Mark Abraham