The improvement of the routine vnl_start is for the unpacked version of the routine.
I tested the gain for TD calculations
- Si primitive in serial, 1kpt
- Si cubic in serial, 1kpt (spacing = 0.5,0.45,0.4,0.35)
- hBN bilayer with c=240Bohr, 32 cores, 16kpt, parallelization en kpt+states (spacing = 0.5,0.45,0.4,0.35)
The attached plot show the time (from profiling/time.00000) vs the number of inner points in the mesh for hBN
and bulk silicon
The gain becomes more important when the ratio states/grid points becomes smaller.