Skip to content

improve sparse permutations

Reference issue

What does this implement/fix?

Currently, sparse permutations are performed in the following manner:

  1. the right hand side expression is evaluated into a sparse matrix temporary (allocation)
  2. the result is created in a new sparse matrix temporary (allocation)
  3. the result is assigned to the destination (allocation)

If the inverse permutation is requested, then the inverse permutation is computed (allocation)

If the inner indices are permuted, the transpose of the matrix is constructed so that the transpose assignment implicitly sorts the new inner indices, which requires random access and is considerably slower.

This MR seeks to reduce the number of allocations and improve performance. The new sequence of events is:

  1. if the right hand side is a plain sparse object, just reference it. otherwise, evaluate it into a temporary (no allocation in many cases)
  2. the result is created in a new sparse matrix temporary (allocation)
  3. the result is moved to the destination (no allocation if the objects are compatible, i.e. storage orders match)

If the inner indices are permuted, and the inverse permutation is requested, then the inverse permutation is computed (allocation in one of the four use cases). If the inner indices are permuted, the elements with their updated indices are inserted into the result in an unsorted fashion. the indices are sorted in place after the matrix is finalized.

In summary, the previous strategy always had 3 copies of the matrix and 50% of the use cases (outer/inner inverse permutation) requires a copy of the permutation. Now, permuting a plain matrix involves 1 copy, and 25% of use cases (inner inverse) requires a copy of the permutation. In all cases, the permuted matrix is now constructed in a manner that allows contiguous chunks of data to be copied, instead of elements one-at-a-time in random order.

Operation Before After % Change
Outer 7196 3847 -47%
Inner 16125 4069 -75%
Inverse Outer 6895 4433 -36%
Inverse Inner 15918 4257 -73%

Note that the largest improvement is due to avoiding the construction of the transposed matrix (inner permutations). All cases benefit from avoiding two fewer copies of the matrix (reference xpr, move result).

Additional information

Edited by Charles Schlosser

Merge request reports

Loading