FPE when reading xtc trajectory
Summary
At least for some trajectory files, we can get division by zero in https://gitlab.com/gromacs/gromacs/blob/2e7c2dfb0b17b53d03d93cf72eea592050f22147/src/gromacs/fileio/libxdrf.cpp#L371.
Could be a broken XTC file, but in this case we anyway should handle it better.
Originally reported in https://gromacs.bioexcel.eu/t/floating-point-exception-at-h-bond-analysis/8477.
Exact steps to reproduce
With files from https://gromacs.bioexcel.eu/t/floating-point-exception-at-h-bond-analysis/8477/5:
$ echo -e '1\n13' | OMP_NUM_THREADS=16 /tmp/gromacs/build/tmp/bin/gmx hbond -f mdnojump.xtc -s md100ns.tpr -num hbnum.xvg -tu ns
:-) GROMACS - gmx hbond, 2023.1-dev-20230421-9044e0fa49 (-:
Executable: /tmp/gromacs/build/tmp/bin/gmx
Data prefix: /tmp/gromacs (source tree)
Working dir: /home/aland/issue-f8477
Command line:
gmx hbond -f mdnojump.xtc -s md100ns.tpr -num hbnum.xvg -tu ns
Reading file md100ns.tpr, VERSION 2021.3 (single precision)
Note: file tpx version 122, software tpx version 129
Specify 2 groups to analyze:
[.....]
Select a group: Selected 1: 'Protein'
Select a group: Selected 13: 'LIG'
Checking for overlap in atoms between Protein and LIG
Calculating hydrogen bonds between Protein (4447 atoms) and LIG (51 atoms)
Found 400 donors and 793 acceptors
Reading frame 0 time 0.000
Will do grid-search on 20x20x20 grid, rcut=0.34999999
Frame loop parallelized with OpenMP using 16 threads.
Reading frame 9000 time 90.000 Floating point exception (core dumped)
With gdb
:
Thread 14 "gmx" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffee1f3640 (LWP 2188459)]
0x00007ffff7359981 in receiveints (buf=0x7fff981f2740, num_of_ints=3, num_of_bits=8, sizes=0x7fffee1f277c, nums=0x7fff981a8d6c) at /tmp/gromacs/src/gromacs/fileio/libxdrf.cpp:367
367 p = num / sizes[i];
(gdb) bt
#0 0x00007ffff7359981 in receiveints (buf=0x7fff981f2740, num_of_ints=3, num_of_bits=8, sizes=0x7fffee1f277c, nums=0x7fff981a8d6c) at /tmp/gromacs/src/gromacs/fileio/libxdrf.cpp:367
#1 0x00007ffff735b69a in xdr3dfcoord (xdrs=0x55555564e0e0, fp=0x5555558a5d80, size=0x7fffee1f2a84, precision=0x5555557387fc) at /tmp/gromacs/src/gromacs/fileio/libxdrf.cpp:951
#2 0x00007ffff739835a in xtc_coord (xd=0x55555564e0e0, natoms=0x7fffee1f2a84, box=0x555555738834, x=0x5555558a5d80, prec=0x5555557387fc, bRead=true) at /tmp/gromacs/src/gromacs/fileio/xtcio.cpp:192
#3 0x00007ffff73986e5 in read_next_xtc (fio=0x5555555cedc0, natoms=69491, step=0x5555557387d0, time=0x5555557387dc, box=0x555555738834, x=0x5555558a5d80, prec=0x5555557387fc, bOK=0x7fffee1f2b20) at /tmp/gromacs/src/gromacs/fileio/xtcio.cpp:281
#4 0x00007ffff73938f8 in read_next_frame (oenv=0x5555555cfe40, status=0x5555556963c0, fr=0x5555557387c0) at /tmp/gromacs/src/gromacs/fileio/trxio.cpp:870
#5 0x00007ffff7394827 in read_next_x (oenv=0x5555555cfe40, status=0x5555556963c0, t=0x7fffffffbbb0, x=0x5555558a5d80, box=0x7fffffffc0e0) at /tmp/gromacs/src/gromacs/fileio/trxio.cpp:1132
#6 0x00007ffff6b66dbd in _Z9gmx_hbondiPPc._omp_fn.0(void) () at /tmp/gromacs/src/gromacs/gmxana/gmx_hbond.cpp:3061
#7 0x00007ffff4ca6c0e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
#8 0x00007ffff4694ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#9 0x00007ffff4726850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) p sizes[i]
$2 = 0
(gdb) up
#1 0x00007ffff735b69a in xdr3dfcoord (xdrs=0x55555564e0e0, fp=0x5555558a5d80, size=0x7fffee1f2a84, precision=0x5555557387fc) at /tmp/gromacs/src/gromacs/fileio/libxdrf.cpp:951
951 receiveints(buf, 3, smallidx, sizesmall, thiscoord);
(gdb) p sizesmall
$3 = {0, 0, 0}
The problem occurs with GROMACS 2023.1 and 2023.3 (haven't checked other versions).
Can also be reproduced (faster) with gmx trjconv -f mdnojump.xtc -s md100ns.tpr -b 95 -tu ns <<< 0
For developers: Why is this important?
If the trajectory file is correct, we should process it just fine. If the trajectory file is corrupted, we should print a proper error message. Failing with SIGFPE is never good.