Parallel filling of sparse matrix
I have an OpenMP-parallelized finite-element code that calculates the matrix entries with up to 128 threads on an AMD Epyc CPU. Memory for the sparse matrix is pre-allocated using m_matrix.reservePerColumn(nnz), where nnz is a guess.
When filling the matrix with
#pragma omp critical
m_matrix.coeffRef(ii,jj) += localMatrix(i,j);
everything works fine but of course the performance is poor.
#pragma omp atomic
m_matrix.coeffRef(ii,jj) += localMatrix(i,j);
compiles fine but yields a segmentation fault when run with multiple OpenMP threads.
I am wondering what is the suggested way to fill a sparse matrix in a multi-threaded code?