Simulation segfaults with Gromacs 2022 and onwards, fine with 2021
**Summary**
I am simulating simple nucleic acid systems to test new virtual site parameters. Early tests with 2021.1 worked fine, but now that I am making more extensive tests I use a newer version of Gromacs and the simulations usually segfaults within a few minutes or tens of minutes, usually without any signs of instability. Restarts from checkpoints work well and go beyond the time step where the segfault happened.
These simulations use a timestep of 5 fs, and I have reduced the p-coupling frequency due to someone reporting segfaults possibly related to that (cannot find that issue now). I use the following constraints parameters:
```plaintext
constraints = all-bonds
constraint-algorithm = Lincs
continuation = yes
lincs-order = 6
lincs-iter = 1
lincs-warnangle = 30
```
**GROMACS version**
Tested and failing on the following versions. Time until segfault in parentheses 2022.2 (3 min) 2022.5 (3 min) 2023 (14 min) 2023.2 (50 min, ran again and got 5 min)
These were all centre-provided at the Dardel supercomputer (PDC, Stockholm). I have seen the same behaviour for centre-provided 2021.1 on Rackham (Uppmax, Uppsala) and my own build of 2023 and 2023.2 on Rackham. Dardel is AMD-based, Rackham is Intel. The compilers used seem to differ between the 202\[12\] and 2023 versions, the former used clang and the latter gcc.
**Steps to reproduce**
```plaintext
gmx grompp -f run.mdp -o run_2023.2.tpr -c npt.gro -p salty.top -n xna.ndx
gmx mdrun -deffnm run_2023.2 -maxh 24 -cpi -cpo -nt 8
```
Note that this system doesn't use any new parameters, so no FF modifications needed. Angle constraints already defined in the FF.
**What is the current bug behavior?**
The steps above caused segfaults within a few minutes (1-15) with all the 202\[23\] versions I have tested, but my 2021.7 run finished its 24 hours without problems.
In one simulation (version 2023) I do get a warning:
> WARNING: Listed nonbonded interaction between particles 2 and 4 at distance 3.857 which is larger than the table limit 2.158 nm.
>
> This is likely either a 1,4 interaction, or a listed interaction inside a smaller molecule you are decoupling during a free energy calculation. Since interactions at distances beyond the table cannot be computed, they are skipped until they are inside the table limit again. You will only see this message once, even if it occurs for several interactions.
>
> IMPORTANT: This should not happen in a stable simulation, so there is probably something wrong with your system. Only change the table-extension distance in the mdp file if you are really sure that is the reason.
**What did you expect the correct behavior to be?**
No segfault
run.log:
> ...
>
> GROMACS: gmx mdrun, version 2023.2 Executable: /pdc/software/22.06/eb/software/gromacs/2023.2-cpeGNU-22.06/bin/gmx Data prefix: /pdc/software/22.06/eb/software/gromacs/2023.2-cpeGNU-22.06 Working dir: /cfs/klemming/home/e/erma/Private/sim/xna/base/dA/run Process ID: 16937 Command line: gmx mdrun -deffnm run -maxh 24 -cpi -cpo -nt 8
>
> GROMACS version: 2023.2 Precision: mixed Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 128) GPU support: disabled SIMD instructions: AVX2_256 CPU FFT library: commercial-fftw-3.3.10-sse2-avx-avx2-avx2_128 GPU FFT library: none Multi-GPU FFT: none RDTSCP usage: enabled TNG support: enabled Hwloc support: disabled Tracing support: disabled C compiler: /opt/cray/pe/craype/2.7.16/bin/cc GNU 11.2.0 C compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -O3 -DNDEBUG C++ compiler: /opt/cray/pe/craype/2.7.16/bin/CC GNU 11.2.0 C++ compiler flags: -fexcess-precision=fast -funroll-all-loops -mavx2 -mfma -Wno-missing-field-initializers -Wno-cast-function-type-strict -fopenmp -O3 -DNDEBUG BLAS library: External - detected on the system LAPACK library: External - detected on the system
>
> Running on 1 node with total 8 cores, 16 processing units Hardware detected on host nid002577: CPU info: Vendor: AMD Brand: AMD EPYC 7742 64-Core Processor Family: 23 Model: 49 Stepping: 0 Features: aes amd apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf misalignsse mmx msr nonstop_tsc pclmuldq pdpe1gb popcnt pse rdrnd rdtscp sha sse2 sse3 sse4a sse4.1 sse4.2 ssse3 x2apic Hardware topology: Basic Packages, cores, and logical processors: \[indices refer to OS logical processors\] Package 0: \[ 39 167\] \[ 40 168\] \[ 41 169\] \[ 42 170\] Package 1: \[ 100 228\] \[ 101 229\] \[ 102 230\] \[ 103 231\] CPU limit set by OS: -1 Recommended max number of threads: 16
>
> ...
>
> Started mdrun on rank 0 Thu Sep 21 15:42:28 2023
>
> ```plaintext
> Step Time
> 0 0.00000
> ```
>
> Energies (kJ/mol) Connect Bonds Angle Proper Dih. Per. Imp. Dih. LJ-14 0.00000e+00 2.81462e+01 8.24601e+01 6.92898e-01 3.17465e+01 Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. -2.47124e+02 9.61372e+03 -5.39470e+02 -7.34966e+04 4.09079e+02 Potential Kinetic En. Total Energy Conserved En. Temperature -6.41174e+04 1.15222e+04 -5.25951e+04 -5.25924e+04 3.04004e+02 Pres. DC (bar) Pressure (bar) Constr. rmsd -1.95680e+02. -1.11758e+02 4.26670e-05
That is the abrupt end of the log file above.
The slurm output ends with:
> starting mdrun 'Protein in water' 200000000 steps, 1000000.0 ps. /var/spool/slurmd/job2427753/slurm_script: line 24: 16937 Segmentation fault gmx mdrun -deffnm run -maxh 24 -cpi -cpo -nt 8
**Possible fixes**
**_Not fixes, but thoughts_**
These simulations are meant to test the new virtual site parameters for DNA and RNA, so it is tempting to blame those parameters. But this dA nucleotide doesn't have any new parameters associated with it. Moreover, I have had no crashes on 2021.7 or 2021.1 for a any of my nucleotide systems using the these parameters, which is in line with the test results above.
There are angle constraints for alcohol groups, but those were already present in the force field. The manual warns from using LINCS with "coupled angle-constraints". It is a bit unclear exactly what that means, but in the nucleotides theer are no constraint triangles sharing any atoms. There are alcohold attached to ring structures, but I don't think they are very different from what you see in tyrosine, which works fine.
issue