OpenCL clearDeviceBufferAsync is leaking cl_event's
Summary
In OpenCL, clearDeviceBufferAsync
is requesting an event from clEnqueueFillBuffer
, but does not call clReleaseEvent
on it or use it in any way. This leads to resource leakage.
Introduced far back in dc6c9960.
Exact steps to reproduce
Running gmx mdrun
with OpenCL under valgrind produces, among other warnings, multiple instances of the following:
==3859774== 1,296 (648 direct, 648 indirect) bytes in 1 blocks are definitely lost in loss record 2,242 of 2,365
==3859774== at 0x483CFD3: operator new(unsigned long) (vg_replace_malloc.c:472)
==3859774== by 0x72B44E4: clEnqueueFillBuffer (in /opt/tcbsys/rocm/5.4.1/20.04/lib/libamdocl64.so)
==3859774== by 0x55B5113: void clearDeviceBufferAsync<float>(TypedClMemory<float>*, unsigned long, unsigned long, DeviceStream const&) [clone .part.0] (devicebuffer_ocl.h:272)
==3859774== by 0x55B877A: pme_gpu_reinit_atoms(PmeGpu*, int, float const*, float const*) (pme_gpu_internal.cpp:1445)
==3859774== by 0x4F2EBB0: gmx::mdAlgorithmsSetupAtomData(t_commrec const*, t_inputrec const&, gmx_mtop_t const&, gmx_localtop_t*, t_forcerec*, gmx::ForceBuffers*, gmx::MDAtoms*, gmx::Constraints*, gmx::VirtualSitesHandler*, gmx_shellfc_t*) (mdsetup.cpp:147)
==3859774== by 0x56A9F66: gmx::LegacySimulator::do_md() (md.cpp:390)
==3859774== by 0x56A3551: gmx::LegacySimulator::run() (legacysimulator.cpp:72)
==3859774== by 0x56D74A3: gmx::Mdrunner::mdrunner() (runner.cpp:2220)
==3859774== by 0x113607: gmx::gmx_mdrun(tmpi_comm_*, gmx_hw_info_t const&, int, char**) (mdrun.cpp:280)
==3859774== by 0x113799: gmx::gmx_mdrun(int, char**) (mdrun.cpp:82)
==3859774== by 0x4EE7A54: gmx::CommandLineModuleManager::run(int, char**) (cmdlinemodulemanager.cpp:569)
==3859774== by 0x10FDBF: main (gmx.cpp:58)
For developers: Why is this important?
OpenCL is deprecated, but continuously leaking resources while we're running is not nice.
Potentially the reason for https://gromacs.bioexcel.eu/t/increasing-and-excesive-use-of-memory-using-opencl-and-amd-gpu/6604.
If this is a bug, (1) what happens, and (2) what did you expect to happen?
- GROMACS leaks resources on each call to
clearDeviceBufferAsync
, and the memory usage of thegmx
process growth over time. - GROMACS properly manages OpenCL resources, and the memory usage of the long-running process stays approximately constant.
Relevant input files, logs and/or screenshots
Possible fixes
Since the event is not used, pass NULL
instead.