Skip to content

Allocate packed host memory by default for wavefunctions and overlap computation/communication for TD

Sebastian Ohlmann requested to merge packed_host_default_special_mem into develop

Description

Batches that are initialized without external memory are now allocated directly in a packed state, but only for the state wavefunctions at the moment. The special flag is not passed when using copy_to to avoid allocating pinned memory for GPU runs which is very costly.

Also allow packing/unpacking to be asynchronous if it consists only of copying from packed memory on the host to the packed memory on the device and back. This is then used in the TD propagation to allow overlapping computation and data transfer by prefetching the next batch asynchronously while working on the current batch. For some tests I made, the overlapping for TD calculations improved the runtime by a factor of 1.8.

Closes #228 (closed).

News snippet

Initialize batches in packed state for the wave functions. Introduce overlapping computation and data transfer for GPU runs.

Checklist

  • I have checked that my code follows the Octopus coding standards
  • I have added tests for all the new features added in this request.
Edited by Martin Lueders

Merge request reports