Manage lifetime of hardware detection better
Currently mdrunner()
detects hardware once, managing the lifetime with a file static pointer associated with a mutex so that other thread-MPI ranks skip the detection and just use the result. That also works fine for real MPI because nothing else in mdrun tries to detect hardware. That approach is a legacy of designing only for a command-line tool, many moons ago. It's serendipitously efficient for test binaries that perhaps want to loop over the GPUs and then call mdrun many times because the file static pointer remains valid until the binary exits; the multiple calls to the detection routine each time mdrun runs will do very little work.
However
- the destructor of that file static pointer called at process exit does bad things to current beta09/10 oneAPI, and
- API-driven simulations would also like to direct the use of a particular communicator, so that hardware detection should work within that, and then be able to be retained for future re-use for successive simulation sessions.
That means we want better lifetime management for the hardware detection result.
Andrey's attempt to fix the SYCL issue at !781 (closed) created further issues to resolve. I'd already started !739 (closed) to address the API case, so perhaps it is better to extracting the core of !739 (closed) to resolve the SYCL issue for release-2021, while preparing to have better control for API-driven simulations.
That will likely
- make MdrunnerBuilder a bit more ugly, but perhaps minimally if done by an
addHardwareDetectionResult
method that simply takes a const pointer for now, and can be improved later to take a deep copy (Edit: was done that way) - pessimize quite a few test binaries that will now call hardware detection each time they invoke mdrun, unless we can find a way to pass such a handle up through both the call stacks from the
gmx
entry point and the test-binary setup code. (Edit: a way was found to pass the handle from test-binary setup throughgmx_mdrun
#3774 (closed) will resolve the issue properly)