Threading the Davidson solver (!72) · Merge requests · QEF - Quantum ESPRESSO Foundation / q-e

Ye Luo requested to merge ye-luo/q-e:opt-threading-all-parts into develop Jun 03, 2018

This development branch improves the thread scaling in the cegterg.f90. h_psi and its callees are fully threaded but cdiaghg is untouched in this merge request and is the threading bottleneck left.

The major changes are removing unnecessary memory set and copy operations and also getting the necessary ones threaded if they are big in size. For this reason, even if only using MPI, the performance is also improved a bit.

I did preliminary performance study provided at https://gitlab.com/ye-luo/q-e/wikis/Threading-the-davidson-solver In large simulations, using 16 threads or more becomes possible.

The gamma trick code path was not touched.

UtilXlib/thread_util.f90 is introduced to provide threaded memcpy and memset for large dataset.

All the serial and parallel pw tests pass.

Edited Jun 08, 2018 by Ye Luo

Admin message

Threading the Davidson solver

Merge request reports