decide when / whether accelerator device release / context destruction should happen

In relationship with #2915 and specifically the CUDA-aware MPI support proposed in !37 (closed) the issue of context teardown happening too early has come up. When CUDA-aware MPI is used we can not destroy the context before MPI_Finalize(), hence currently we omit calling releaseDevice(deviceInfo) in this case.

Additionally, during device detection sanity checking destroys the context (assuming it created it), which can also lead to destruction of an earlier context, (e.g. by MPI runtimes if done during MPI_Init() or library users using other CUDA code).

Given the library-use of both libgromacs as well as nblib, the boarder question is:

under which circumstances and
at what time during the cleanup after the run finished should we destroy the GPU context?

Edited Jan 28, 2022 by Szilárd Páll