OpenMP is disabled for bonded list update in Clang-CUDA builds
Summary
With CUDA-Clang build, we pass -fno-openmp
flag to all GPU-related files to work around an old Clang bug under the assumption that no CUDA file uses OpenMP. But that assumption is not valid anymore.
-
Our CMake code: https://gitlab.com/gromacs/gromacs/blob/bbe83e52dda31475f634993fb5c587753ac24154/cmake/gmxManageClangCudaConfig.cmake#L123-124
-
MR introducing the OpenMP region in a GPU source file: !3899 (merged)
-
Similar issue: #4747 / !3540 (merged)
Exact steps to reproduce
$ cmake ../.. -DCMAKE_C_COMPILER=clang-17 -DCMAKE_CXX_COMPILER=clang++-17 -DGMX_GPU=CUDA -DGMX_CLANG_CUDA=ON -DGMX_CUDA_TARGET_SM=86
[...]
$ make -j$(nproc)
[...]
$ touch ../../src/gromacs/listed_forces/listed_forces_gpu_impl_gpu.cpp
$ VERBOSE=1 make
[ 16%] Building CXX object src/gromacs/CMakeFiles/libgromacs.dir/listed_forces/listed_forces_gpu_impl_gpu.cpp.o
[...] /usr/bin/clang++-17 [...] -Wno-source-uses-openmp [...] -fopenmp=libomp [...] -fno-openmp [...]
In this case, -fno-openmp
overrides the earlier -fopenmp=libomp
and disables the use of OpenMP on the host, which can be seen if we remove -Wno-source-uses-openmp
:
/home/aland/gromacs/src/gromacs/listed_forces/listed_forces_gpu_impl_gpu.cpp:192:9: warning: unexpected '#pragma omp ...' in program [-Wsource-uses-openmp]
192 | #pragma omp parallel for num_threads(gmx_omp_nthreads_get(ModuleMultiThread::Bonded)) schedule(static)
| ^
1 warning generated when compiling for sm_86.
/home/aland/gromacs/src/gromacs/listed_forces/listed_forces_gpu_impl_gpu.cpp:192:9: warning: unexpected '#pragma omp ...' in program [-Wsource-uses-openmp]
192 | #pragma omp parallel for num_threads(gmx_omp_nthreads_get(ModuleMultiThread::Bonded)) schedule(static)
| ^
1 warning generated when compiling for host.
The first warning, for the sm_86 pass is expected and is the reason to keep -Wno-source-uses-openmp
. The second indicate that OpenMP gets disabled for the host.
For developers: Why is this important?
We should use OpenMP whenever we intend to use it.
Possible fixes
The easiest solution is making the CMake flag addition more narrow. Per https://bugs.llvm.org/show_bug.cgi?id=45533, the bug is fixed in Clang 11, so there is no need to add -fno-openmp
with new versions.
Clang-CUDA is not widely used and the performance benefit of !3899 (merged) is not huge, so we can keep it as-is (without parallelization) with Clang 9-10.