Insert from triplets
Reference issue
More tools for sparse matrix users per discussion in #2631 (closed)
What does this implement/fix?
Add insertFromTriplets/insertFromSortedTriplets
which allows users to add a batch of triplets to an existing sparse matrix with functionality similar to that of setFromTriplets
If the duplicate functor is addition (the most common case), this is not much different from the following operation (which is basically a merge of sorted arrays)
SparseMatrixType tmp(mat.rows(),mat.cols());
tmp.setFromTriplets(begin,end);
mat += tmp; // mat = mat.binaryExpr(tmp,internal::scalar_sum_op<SparseMatrixType::Scalar>());
However, this only works by happenstance as the binary evaluator handles non-duplicate entries by assuming a neutral element of zero m_value = m_functor(m_lhsIter.value(), Scalar(0));
For addition: a = a + 0 == a
. This is not valid in general, as is the case when the functor is multiplication: a = a * 0 != a
. This is addressed by specializing the sparse-sparse evaulator when the duplicate functor is wrapped with scalar_dup_op
. In this case, non-duplicate entries are returned without modification.
Also, this MR reverts setFromTriplets
to perform out-of-place sorting with transposed assignment (credits to the original author Gael) with a few minor optimizations.
setFromTriplets
benchmarks:
The performance of setFromTriplets
is sensitive to the ordering of the triplets as well as the number of duplicate entries. This benchmark sets 50 sparse matrices with pre-allocated memory from a container of randomly shuffled triplets with the following characteristics:
Rows: 1,140,149
Cols: 1,140,149
Unique nonzeros: 3,309,592
Duplicates: 827,398
Time in ms
Eigen 3.4 | This MR |
---|---|
15,282 | 14,469 |
15,824 | 14,167 |
15,233 | 13,844 |
15,120 | 14,261 |
This is a roughly 7% faster than the implementation in 3.4. This improvement is attributed not using reserveInnerVectors
, which does a bit more work than is required than creating a brand new sparse matrix, and instead allocates a temporary array for the non-zero entries and re-uses the same buffer for the duplicate removal. This also attempts to allocate this buffer using ei_declare_aligned_stack_constructed_variable
which can avoid a heap allocation if the matrix inner/outer sizes are not astoundingly gigantic.
Previously the following fixes were related to this MR, but I decided to go in another direction. Still, I think these are worthwhile changes/improvements:
- Generalized
CompressedStorageIterator
so that its value type is notstd::pair
but a thin pair-like class with comparators (the default comparator forstd::pair
containing a complex scalar is problematic). - Allow
CompressedStorageIterator
's value and reference types to be compared directly toStorageIndex
(convenient for some STL algorithms)