Replace direct calls to CUDA memory management API with wrappers in NBNXM
Replace cudaMalloc(...) and cudaMemset(...) with the respective wrappers in CUDA implementation of NBNXM.
Refs #3318 (closed) Refs #3311 (closed)
From: https://gerrit.gromacs.org/#/c/gromacs/+/16541/
Compare with the parent: https://gitlab.com/gromacs/gromacs/-/compare/devicebuffer_RemoveDuplicatingCopyWrappersInCUDA...devicebuffer_ReplaceDirectCUDACallsNBNXM
Edited by Artem Zhmurov