tweak default nonbonded CUDA kernels selection

It has been observed that the default choice of kernel flavors is not always optimal:

  • tabulated Ewald correction faster in the energy+force kernels (on Turing, Volta, Ampere, and most cases on Pascal too) and force-only kernels (Volta, Ampere)
  • not using textures can be faster (Volta and Ampere)
Edited by Mark Abraham