Skip to content

Threading the Davidson solver

Ye Luo requested to merge ye-luo/q-e:opt-threading-all-parts into develop

This development branch improves the thread scaling in the cegterg.f90. h_psi and its callees are fully threaded but cdiaghg is untouched in this merge request and is the threading bottleneck left.

The major changes are removing unnecessary memory set and copy operations and also getting the necessary ones threaded if they are big in size. For this reason, even if only using MPI, the performance is also improved a bit.

I did preliminary performance study provided at https://gitlab.com/ye-luo/q-e/wikis/Threading-the-davidson-solver In large simulations, using 16 threads or more becomes possible.

The gamma trick code path was not touched.

UtilXlib/thread_util.f90 is introduced to provide threaded memcpy and memset for large dataset.

All the serial and parallel pw tests pass.

Edited by Ye Luo

Merge request reports