Multithreaded interface for assembling a `CoordinateMatrix`

The assembly of sparse matrices (especially in the case of Finite Element Methods) can take a substantial amount of time. For example, the example/helmholtz_3d_pml.cc driver, when run with 16 cores on the 60 x 60 x 60 element domain, spends more time in the sequential FEM matrix assembly than in the multithreaded factorization. A multithreaded interface to CoordinateMatrix is therefore critical for parallel performance.

Admin message

Multithreaded interface for assembling a `CoordinateMatrix`