Speed up sparse x dense dot product.
This applies a small trick used in https://gitlab.com/libeigen/eigen/-/blame/master/Eigen/src/SparseCore/SparseDenseProduct.h#L69 to speed up sparse x dense dot products. Also applies the "inline" keyword to the methods in SparseDot.h for a small improvement.
This reduces the time for SparseQR applied to the test matrix in #2583 (closed) from ~200s to ~165s.
Profile before: $3678759
Profile after: $3679225
FYI: I also tried to manually vectorize the dot product, but with no success - it was slower, despite trying multiple variations.
Edited by Rasmus Munk Larsen