Propagate DMRGSCF/QCMaquis results along scan geometries
Dear developers,
given a set of geometries along a coordinate, e.g., from a relaxed scan, is there a way to propagate the results of a DRMG calculation using QCMaquis and the OpenMolcas interface along the geometries?
Lets say I start at the first geometry and obtain a wavefunction via DMRGSCF/QCMaquis, can I somehow use these results (checkpoint_state.*.h5
and *.results_state.h5
files) as input for the next geometry along the scan?
I already noticed the donotdelete
keyword, that skips deletion of the checkpoint files before the calculation starts. By copying the checkpoint_state.*.h5
files from a previous calculation to the scratch-dir in $MOLCAS_WORKDIR
and by coping the *.results_state.h5
to $CurrDir I made some progress and QCMaquis seems to read these files. All files were appropriately renamed.
But the calculation quickly crashes with errors like
Iter num DMRG max tr DMRG SX DMRGSCF Energy max ROT max BLB max BLB Level Ln srch Step QN CPU Time
sweeps/root weight/root iter energy change param element value shift minimum type update hh:mm:ss
The group '/spectrum/iteration/48/results' does not exist.
In /build/source/build/External/qcmaquis/src/qcmaquis/dmrg/alps/src/alps/hdf5/archive.cpp on 607 in list_children
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/qcmaquis/lib/libalps.so(+0x105049) [0x7f8cf2bc6049]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe(_ZN17results_collector4loadIN4alps4hdf57archiveEEEvRT_+0x62) [0x6b7ef2]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/qcmaquis/lib/libmaquis_dmrg.so(+0x1cec7f) [0x7f8cf2f97c7f]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/qcmaquis/lib/libmaquis_dmrg.so(_ZN13interface_simIN4alps7numeric6matrixIdSt6vectorIdSaIdEEEE16SU2U1PG_templateIiEE21get_iteration_resultsEv+0xfe) [0x7f8cf306ff9e]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/qcmaquis/lib/libmaquis_dmrg.so(qcmaquis_interface_get_iteration_results+0x42) [0x7f8cf30b9f92]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x51d977]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x51df13]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x48242e]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x460af1]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x452751]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x451a17]
/nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6(+0x23ace) [0x7f8ce5deeace]
/nix/store/ld03l52xq2ssn4x0g5asypsxqls40497-glibc-2.37-8/lib/libc.so.6(__libc_start_main+0x89) [0x7f8ce5deeb89]
/nix/store/cisagh5shrv9ci09dpzvlc5m5ca4m6w4-openmolcas-23.06/bin/dmrgscf.exe() [0x452665]
terminate called after throwing an instance of 'std::runtime_error'
what(): Error reading iteration results from checkpoint.
Program received signal SIGABRT: Process abort signal.
Backtrace for this error:
#0 0x7f8ce5e03d2f in ???
#1 0x7f8ce5e52a8c in __pthread_kill_implementation
#2 0x7f8ce5e03c85 in __GI_raise
#3 0x7f8ce5ded8b9 in abort
#4 0x7f8cf2516a88 in _ZN9__gnu_cxx27__verbose_terminate_handlerEv.cold
#5 0x7f8cf2521f89 in _ZN10__cxxabiv111__terminateEPFvvE
#6 0x7f8cf2521ff4 in _ZSt9terminatev
#7 0x7f8cf2522246 in __cxa_throw
#8 0x7f8cf30700b8 in _ZN13interface_simIN4alps7numeric6matrixIdSt6vectorIdSaIdEEEE16SU2U1PG_templateIiEE21get_iteration_resultsEv
#9 0x7f8cf30b9f91 in qcmaquis_interface_get_iteration_results
#10 0x51d976 in __qcmaquis_interface_MOD_qcmaquis_interface_get_iteration_results
#11 0x51df12 in __qcmaquis_interface_MOD_qcmaquis_interface_run_dmrg
#12 0x48242d in cictl_
#13 0x460af0 in rasscf_
#14 0x452750 in dmrgscf_
#15 0x451a16 in main
--- Stop Module: dmrgscf at Tue Oct 24 09:31:33 2023 /rc=-6 ---
*** files: 03_dnd.dmrgscf.h5 xmldump
saved to directory /home/johannes/Arbeit/00_azobenzene_ionization/06_n_expelled/16_dmrg/03_dnd
--- Module dmrgscf spent 5 minutes 22 seconds ---
To me the issue seems related to the fact, that QCMaquis just tries to continue the calculation, that is, some counters are not reset and QCMaquis tries to read iteration results from the previous result files, that are not present. Please see the first lines from the QCMaquis.log
file below
This binary contains symmetries: 2u1pg su2u1pg
DMRG version 3.1.1
Temporary storage enabled in ./tmp//storage_temp_0084ffb9579c/
Will start again at site -1 in sweep 42
Loading checkpoint from /home/johannes/Arbeit/00_azobenzene_ionization/06_n_expelled/16_dmrg/03_dnd/03_dnd.checkpoint_state.0.h5
Parameters:
...
It seems the checkpoint file is successfully read, but it says it restarts in sweep 42
instead of sweep 1
.
Interestingly enough, the calculation calculation only crashed for the 7th state, after reading 03_dnd.checkpoint_state.6.h5
. The calculation proceeded fine for the 6 checkpoint files before ...
I also noted that some absolute paths are written to *.results_state.*.h5/parameters/{chkpfile,resultfile}
. The *.dmrgscf.h5
file that I used as fileorb
also contained checkpoint filenames. Is there something to gain by updating these filenames/paths?
So my question: is there a canonical way to make this work? Or can I somehow modify the HDF5-files and manually reset some counters, so QCMaquis still reads in the previous results but starts "fresh" somehow?
The problems occurs with:
OpenMolcas v23.06
Host name: localhost (Linux)
C Compiler ID: GNU
C flags: -std=gnu99 -fopenmp
Fortran Compiler ID: GNU
Fortran flags: -fno-aggressive-loop-optimizations -cpp -fdefault-integer-8 -fopenmp
Definitions: _MOLCAS_;_I8_;_LINUX_
Parallel: OFF (GA=OFF)
QCMaquis release-3.1.1
Beside the output file I did not attach any other files because overall they would be too huge. If anything else is required please let me know.
All the best, Johannes