Overhaul Sparse Core
Reference issue
What does this implement/fix?
Various improvements to SparseMatrix and related classes. Minor improvements include:
- substitute legacy code for STL algorithms where possible. this reduces bloat, is easier to maintain, and is probably faster
- use aligned maps now that buffers are aligned to make use of packet ops.
- replace a bunch of assignment loops with
smart_memmove(andmoveChunk, which now exclusively callssmart_memmove)
More substantial improvements include:
- reworked
setFromTripletsto perform fewer passes over the triplets. Currently, a transposed and unordered copy of the matrix is constructed, and the matrix is ordered using the transposition assignment. This version constructs an unordered matrix, handles the duplicates, and sorts the inner indices after construction. It also performs one allocation (possibly on the stack) to track nonzeros and collapse duplicates instead of two. - add
setFromSortedTripletswhich performs all steps in one pass (after scanning to determine allocation size) and uses no temporary storage to achieve the same outcome assetFromTriplets. A container (such as astd::vector) of triplets can be easily sorted usingstd::sort(as is demonstrated in the tests), so this function should be used whenever possible. - reworked
conservativeResizeto use binary search and general cleanup. Currently, decreasing the inner size always uncompresses the matrix. In this version, the matrix is uncompressed (if not already) only if inner size is decreased and nonzeros are lost due to inner size change. Additionally,data().resize()is appropriately called to minimize subsequent reallocations. - separate search and insertion functions in
insert. Currently,coeffRefperforms a binary search to find an element. If it does not exist, it callsinsertwhere the search is performed again. addedinsertAt,insertUncompressedAtandinsertCompressedAtthat uses a previously determined insertion location frominsertorcoeffRef. Updated and deprecatedinsertUncompressedandinsertCompressedin any anyone uses those. Will gladly delete. -
prunenow works on uncompressed matrices, resolving a long standing TODO
setFromTriplets benchmarks (initialize large sparse matrix 50 times):
- setFromTriplets (old), shuffled triplets: 12s
- setFromTriplets (old), sorted triplets: 9s
- setFromTriplets (new), shuffled triplets: 12s
- setFromTriplets (new), sorted triplets: 3s
- setFromSortedTriplets, sorted triplets: 1.6s
While the new setFromTriplets is on par with the previous algorithm when the triplets are randomly shuffled, the performance gap widens when the triplets are sorted. The new setFromSortedTriplets performs even better, with no temporary storage, and thus should be the preferred method of initializing a sparse matrix.
There are many more fairly simple opportunities for improvement throughout the sparse module, but this MR will focus on code in SparseMatrix.h.
Additional information
Edited by Charles Schlosser