Overhaul Sparse Core

Reference issue

What does this implement/fix?

Various improvements to SparseMatrix and related classes. Minor improvements include:

  • substitute legacy code for STL algorithms where possible. this reduces bloat, is easier to maintain, and is probably faster
  • use aligned maps now that buffers are aligned to make use of packet ops.
  • replace a bunch of assignment loops with smart_memmove (and moveChunk, which now exclusively calls smart_memmove)

More substantial improvements include:

  • reworked setFromTriplets to perform fewer passes over the triplets. Currently, a transposed and unordered copy of the matrix is constructed, and the matrix is ordered using the transposition assignment. This version constructs an unordered matrix, handles the duplicates, and sorts the inner indices after construction. It also performs one allocation (possibly on the stack) to track nonzeros and collapse duplicates instead of two.
  • add setFromSortedTriplets which performs all steps in one pass (after scanning to determine allocation size) and uses no temporary storage to achieve the same outcome as setFromTriplets. A container (such as a std::vector) of triplets can be easily sorted using std::sort (as is demonstrated in the tests), so this function should be used whenever possible.
  • reworked conservativeResize to use binary search and general cleanup. Currently, decreasing the inner size always uncompresses the matrix. In this version, the matrix is uncompressed (if not already) only if inner size is decreased and nonzeros are lost due to inner size change. Additionally, data().resize() is appropriately called to minimize subsequent reallocations.
  • separate search and insertion functions in insert. Currently, coeffRef performs a binary search to find an element. If it does not exist, it calls insert where the search is performed again. added insertAt, insertUncompressedAt and insertCompressedAt that uses a previously determined insertion location from insert or coeffRef. Updated and deprecated insertUncompressed and insertCompressed in any anyone uses those. Will gladly delete.
  • prune now works on uncompressed matrices, resolving a long standing TODO

setFromTriplets benchmarks (initialize large sparse matrix 50 times):

  • setFromTriplets (old), shuffled triplets: 12s
  • setFromTriplets (old), sorted triplets: 9s
  • setFromTriplets (new), shuffled triplets: 12s
  • setFromTriplets (new), sorted triplets: 3s
  • setFromSortedTriplets, sorted triplets: 1.6s

While the new setFromTriplets is on par with the previous algorithm when the triplets are randomly shuffled, the performance gap widens when the triplets are sorted. The new setFromSortedTriplets performs even better, with no temporary storage, and thus should be the preferred method of initializing a sparse matrix.

There are many more fairly simple opportunities for improvement throughout the sparse module, but this MR will focus on code in SparseMatrix.h.

Additional information

Edited by Charles Schlosser

Merge request reports

Loading