Capture terminal output for gmxapi.mdrun
Summary
Try to capture stdout and stderr to files in the working directories of gmxapi.mdrun tasks.
This issue is intentionally orthogonal to library-level updates to output environment management. Before libgromacs first uses the stdout and stderr filehandles, we will attempt to replace them (from the Python interpreter) with alternative filehandles.
- facilitate debugging of ensemble simulation jobs.
- provide a normalized way to programmatically access more output (through the filesystem, at least)
Use cases
- Capture output from multiple simulations more robustly in ensemble contexts.
- Capture terminal output as
stdout.txt
andstderr.txt
until better library-level output handling can be devised.
Impact
This is important to gmxapi users who use gmxapi.mdrun
for ensemble simulations, especially when they are trying to debug failures on non-root ensemble members.
This workaround takes some pressure off of developers to make short-term progress on remediating C stdlib writes to stdout/stderr or extending the output_environment library facility.
@eirrgang proposes to provide a pure-Python patch for gmxapi 0.4.
Detailed description
libgromacs currently has many cases of output being written directly to the stdout or stderr
Previously, when libgromacs wrote to stdout or stderr in a gmxapi script, the output would not be seen by the Python interpreter. In the case of an ensemble simulation (Python interpreter launched with mpiexec
, simulations run with thread-MPI), outputs from non-root-ranks were at the mercy of the MPI framework, and, at best, only available at the job level.
We can manipulate the process filehandles from the Python interpreter before and between calls into libgromacs. This should allow us to record a stdout.txt
and stderr.txt
for each task executed through libgmxapi Python bindings.
In many cases, libgromacs forces the process to crash hard during API calls that fail. In many cases, the Python interpreter has no opportunity to do any sort of clean shut down when an error occurs during simulation. This may result in some unflushed output missing from stdout.txt
or stderr.txt
. Since this may be the most useful output of the job (for debugging purposes), we should probably avoid fancy IO streams or pipelines, and just close/open/close w
filehandles with little to no buffering.
We can refer to the current library implementation as a baseline. If the current implementation relies on the parent process to flush the file descriptors, and it is inappropriate to do otherwise (in order to avoid possible deadlocks while exiting, for instance), we may have to do one or more of the following
- use less stdio buffering for gmxapi.mdrun
- register additional handlers (or traps), e.g.
MPI_Comm_set_errhandler
, and use an extra thread or parent process to try to perform clean-up without dead-locks or unnecessarily long blocking I/O. - wait for GROMACS library development that defers more error handling to the API caller