Random nans and hangs on dev branch, example adiabatic_expansion -- NVIDIA card
The dev branch compiles OK, but running /examples/2D/adiabatic_expansion results in random nans and freezing after a while.
The simulation seems to begin always the same way, but different runs lead to diverging results.
This is one Energy.dat output
# t energy_Ek energy_Ep energy_Ec energy_dEkdt energy_dEpdt energy_dEcdt
0.001 0 0 0 0 0 0
0.002 7.27967e-12 0 -3.21622e-11 7.27967e-09 0 -3.21622e-08
0.003 3.02815e-07 0 -3.02852e-07 0.000302807 0 -0.00030282
0.004 1.2089e-06 0 -1.20894e-06 0.000906089 0 -0.000906091
0.005 1.40932e-06 0 -1.40937e-06 0.000200411 0 -0.000200425
0.006 1.56497e-06 0 -1.56502e-06 0.000155657 0 -0.000155657
0.007 1.80429e-06 0 -1.80434e-06 0.000239317 0 -0.000239317
0.008 0.000211442 0 -0.000110181 0.209637 0 -0.108377
0.009 0.000231159 0 -0.000119161 0.0197176 0 -0.00898004
0.01 0.000231648 0 -0.00012049 0.000488516 0 -0.00132903
0.011 0.000232555 0 -0.000121416 0.000907193 0 -0.000925777
0.012 0.000235288 0 -0.000123307 0.00273297 0 -0.00189085
0.013 0.000238635 0 -0.000125549 0.003347 0 -0.00224193
0.014 0.000251325 0 -0.000137195 0.0126901 0 -0.0116457
0.015 0.000254729 0 -0.000140278 0.00340386 0 -0.00308363
0.016 0.000257535 0 -0.000143086 0.00280594 0 -0.00280805
0.017 0.000258093 0 -0.000143628 0.000557982 0 -0.000542072
0.018 4.61441 0 -3.56004 4614.15 0 -3559.9
0.019 nan nan nan nan nan nan
And this is another Energy.dat output, from another run (same setup, same everything, I just do a ./run --run again).
# t energy_Ek energy_Ep energy_Ec energy_dEkdt energy_dEpdt energy_dEcdt
0.001 0 0 0 0 0 0
0.002 7.27967e-12 0 -3.21622e-11 7.27967e-09 0 -3.21622e-08
0.003 3.02815e-07 0 -3.02852e-07 0.000302807 0 -0.00030282
0.004 1.2089e-06 0 -1.20894e-06 0.000906089 0 -0.000906091
0.005 1.20653e-06 0 -1.20658e-06 -2.37202e-06 0 2.35861e-06
0.006 1.34632e-06 0 -1.34637e-06 0.000139785 0 -0.000139785
0.007 1.56767e-06 0 -1.56773e-06 0.000221357 0 -0.000221357
0.008 1.69154e-06 0 -1.69151e-06 0.000123865 0 -0.000123779
0.009 3.28623e-06 0 -3.28617e-06 0.00159469 0 -0.00159466
0.01 5.46546e-06 0 -5.46536e-06 0.00217923 0 -0.00217919
0.011 5.61839e-06 0 -5.61819e-06 0.00015293 0 -0.000152829
0.012 11.0118 0 -7.83401 11011.7 0 -7834
0.013 nan nan nan nan nan nan
My box info, from command nvidia-smi. The GPU is an NVIDIA GeForce GTX 1060.
Wed Apr 3 13:53:01 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1060 6GB Off | 00000000:01:00.0 On | N/A |
| 42% 45C P0 29W / 120W | 870MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 12123 G /usr/lib/Xorg 380MiB |
| 0 N/A N/A 12246 G xfwm4 2MiB |
| 0 N/A N/A 12369 G /usr/lib/firefox/firefox 334MiB |
| 0 N/A N/A 12393 G /usr/lib/thunderbird/thunderbird 148MiB |
+-----------------------------------------------------------------------------------------+