gromacs 2018.1 doesn't run on KNL - Redmine #2504
Archive from user: Carlo Camilloni
I have tried to run a standard MD simulation on a KNL cluster but I get the following error:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 25280 RUNNING AT r065c04s03
= EXIT CODE: 132
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 25280 RUNNING AT r065c04s03
= EXIT CODE: 4
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
The code is run as:
mpiexec -np 32 mdrun_knl -s topol0 -nb cpu -v -maxh 23.9 -nsteps –1 >& log
and is compiled with Intel 2017, the same happens with the intel 2018 and using the fftw instead of the mkl.
This is the log
md.2018.1.log:
ROMACS version: 2018.1
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support: disabled
SIMD instructions: AVX_512_KNL
FFT library: Intel MKL
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
Built on: 2018-05-17 11:33:17
Built by: ccamillo@r000u06l01 [CMAKE]
Build OS/arch: Linux 3.10.0-327.36.3.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icc Intel 17.0.4.20170411
C compiler flags: -xMIC-AVX512 -mkl=sequential -std=gnu99 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits
C++ compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icpc Intel 17.0.4.20170411
C++ compiler flags: -xMIC-AVX512 -mkl=sequential -std=c++11 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias -no-prec-div -fimf-domain-exclusion=14 -qoverride-limits
(it stops here)
the same TPR with the same setup on the same cluster with gromacs 2016.5 compiled in the same way works well
md.2016.5.log:
GROMACS version: 2016.5
Precision: single
Memory model: 64 bit
MPI library: MPI
OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32)
GPU support: disabled
SIMD instructions: AVX_512_KNL
FFT library: Intel MKL
RDTSCP usage: enabled
TNG support: enabled
Hwloc support: hwloc-1.11.0
Tracing support: disabled
Built on: Thu May 17 12:17:27 CEST 2018
Built by: ccamillo@r000u06l01 [CMAKE]
Build OS/arch: Linux 3.10.0-327.36.3.el7.x86_64 x86_64
Build CPU vendor: Intel
Build CPU brand: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Build CPU family: 6 Model: 79 Stepping: 1
Build CPU features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma hle htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
C compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icc Intel 17.0.4.20170411
C compiler flags: -xMIC-AVX512 -mkl=sequential -std=gnu99 -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias
C++ compiler: /cineca/prod/opt/compilers/intel/pe-xe-2017/binary/bin/icpc Intel 17.0.4.20170411
C++ compiler flags: -xMIC-AVX512 -mkl=sequential -std=c++0x -O3 -DNDEBUG -ip -funroll-all-loops -alias-const -ansi-alias
Running on 1 node with total 68 cores, 272 logical cores
Hardware detected on host r065c06s01 (the node of MPI rank 0):
CPU info:
Vendor: Intel
Brand: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Family: 6 Model: 87 Stepping: 1
Features: aes apic avx avx2 avx512f avx512pf avx512er avx512cd clfsh cmov cx8 cx16 f16c fma htt lahf mmx msr nonstop_tsc pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
SIMD instructions most likely to fit this hardware: AVX_512_KNL
SIMD instructions selected at GROMACS compile time: AVX_512_KNL
Hardware topology: Basic
Sockets, cores, and logical processors:
Socket 0: [ 0 68 136 204] [ 1 69 137 205] [ 2 70 138 206] [ 3 71 139 207] [ 4 72 140 208] [ 5 73 141 209] [ 6 74 142 210] [ 7 75 143 211] [ 8 76 144 212] [ 9 77 145 213] [ 10 78 146 214] [ 11 79 147 215] [ 12 80 148 216] [ 13 81 149 217] [ 14 82 150 218] [ 15 83 151 219] [ 16 84 152 220] [ 17 85 153 221] [ 18 86 154 222] [ 19 87 155 223] [ 20 88 156 224] [ 21 89 157 225] [ 22 90 158 226] [ 23 91 159 227] [ 24 92 160 228] [ 25 93 161 229] [ 26 94 162 230] [ 27 95 163 231] [ 28 96 164 232] [ 29 97 165 233] [ 30 98 166 234] [ 31 99 167 235] [ 32 100 168 236] [ 33 101 169 237] [ 34 102 170 238] [ 35 103 171 239] [ 36 104 172 240] [ 37 105 173 241] [ 38 106 174 242] [ 39 107 175 243] [ 40 108 176 244] [ 41 109 177 245] [ 42 110 178 246] [ 43 111 179 247] [ 44 112 180 248] [ 45 113 181 249] [ 46 114 182 250] [ 47 115 183 251] [ 48 116 184 252] [ 49 117 185 253] [ 50 118 186 254] [ 51 119 187 255] [ 52 120 188 256] [ 53 121 189 257] [ 54 122 190 258] [ 55 123 191 259] [ 56 124 192 260] [ 57 125 193 261] [ 58 126 194 262] [ 59 127 195 263] [ 60 128 196 264] [ 61 129 197 265] [ 62 130 198 266] [ 63 131 199 267] [ 64 132 200 268] [ 65 133 201 269] [ 66 134 202 270] [ 67 135 203 271]
(from redmine: issue id 2504, created on 2018-05-17 by gmxdefault, closed on 2018-05-22)
- Changesets:
- Revision f8b78130 by Roland Schulz on 2018-05-21T20:35:11Z:
Fix illegal instruction error on KNL
Fixes #2504
Change-Id: Ie2f55718f98d3dfbf3c312afa5141c77ead77a6d
- Uploads: