Bug with SIESTA performance
Summary:
I noticed using the system resource monitor that when using siesta-4.1.5 in parallel the program uses more cores than I intended to use. Then, when compiling SIESTA with the shipped in Blas and Lapack 2 things happened. The first one is that the flag -DSIESTA__DIAG_2STAGE
was automatically activated. The second thing that happened is that the program stopped using more cores than ordered and performance increased dramatically. Then, I tried using the previous Lapack and Blas installation with the added flag with the same results. For the execution of SIESTA I used
mpirun -np $NPROC siesta <>
Code version:
$ siesta --4.1.5-g384057250
System information:
I tested this on 3 different computers. The operating systems are Debian 9, Debian 10, and Pop OS 20.04. The compilers are GCC 6.3, GCC 8.3, and GCC 10.3 respectively. Here I add the GCC 10.3 arch.make. The difference between this arch.make and the other ones is that MRRR is not added due to the version of ScaLAPACK and the commented libraries in LIBS are not commented.
Steps to reproduce:
Compile siesta with the provided arch.make and execute a program, you'll notice that it uses the specified amount of cores. Then comment the flag -DSIESTA__DIAG_2STAGE
and libsiestaLAPACK.a libsiestaBLAS.a
on COMP_LIBS
Expected behavior
The program should use the specified amount of cores regardless of the compilation. Also, when it uses all the cores the program creates a huge amount of siesta instances. The same program that took more than 8 h to complete finished in 30 min after the change.