Change inline hint for general_matrix_vector_product<>::run() to gain performance

Description

Change inline hint for general_matrix_vector_product<>::run() to always_inline to gain performance

This is currently with EIGEN_DONT_INLINE. Switching to EIGEN_ALWAYS_INLINE led to performance gains in bench_btl benchmarks that run matrix vector products for small problem sizes.

Reference issue

Additional information

Ran performance benchmarks based on selected problem sizes. For the 4 categories of benchmarks under bench/btl:

  • linear: axpby, axpy, rot
  • vecmat: matrix_vector, atv, symv, syr2, ger
  • matmat: matrix_matrix, ata
  • adv: trisolve_vector, trisolve_matrix, cholesky, partial_lu_decomp, tridiagonalization

I have these 5 problem sizes:

Benchmark XL L M S XS
linear 2999999 101942 3464 117 4
matmat 4999 840 141 23 4
vecmat 4999 840 141 23 4
adv 3000 681 154 35 8

Tested with GNU 15.1.0, NVHPC 25.9, and clang 20.1.0.

This source change led to significant speedups in some of the vecmat tests for S and XS problem sizes for all three compilers tested. There are some slowdowns in a few tests, but the geo-mean across all benchmarks for each problem size either has a speedup or has no change.

Below is a summary of tests with >10% speedups or slowdowns, as well as the geomean for each problem size, on both x86 (Genoa) and ARM (Grace). There is a bigger impact on Genoa.

On Genoa

Problem Size: Medium (M)
Benchmark GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
trisolve_vector 16.76 17.71 +5.6% 12.62 14.14 +12.0% 16.10 17.23 +7.1%
Problem Size: Small (S)
Benchmark GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
atv 12.56 14.31 +13.9% 14.22 15.42 +8.4% 11.67 12.07 +3.4%
axpy 35.47 35.82 +1.0% 23.23 27.36 +17.8% 24.89 24.69 -0.8%
matrix_vector 16.97 18.37 +8.2% 15.11 17.21 +13.9% 13.55 13.09 -3.4%
Problem Size: Extra Small (XS)
Benchmark GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
atv 2.07 2.07 -0.0% 1.71 1.90 +11.3% 1.51 1.71 +13.0%
cholesky 1.82 2.50 +37.2% 1.79 1.73 -3.2% 1.85 1.74 -6.1% ⚠️
ger 2.28 2.27 -0.4% 2.20 2.19 -0.5% 2.00 2.25 +12.7%
matrix_vector 2.31 2.68 +15.9% 2.67 2.78 +4.0% 2.51 2.81 +11.9%
trisolve_matrix 3.36 2.41 -28.3% ⚠️ 2.35 2.58 +9.9% 3.58 3.54 -1.1%
trisolve_vector 1.23 1.22 -0.7% 0.97 0.84 -13.7% ⚠️ 1.26 1.02 -18.6% ⚠️
Summary: Geometric Mean Across All Problem Sizes
Problem Size GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
XL 52.25 52.25 -0.0% 25.59 25.67 +0.3% 52.10 52.04 -0.1%
L 59.03 59.20 +0.3% 29.22 29.48 +0.9% 58.00 57.95 -0.1%
M 41.90 42.12 +0.5% 21.40 21.65 +1.2% 40.38 40.62 +0.6%
S 14.17 14.46 +2.0% 8.49 8.77 +3.3% 12.70 12.77 +0.6%
XS 2.34 2.36 +0.8% 2.03 2.04 +0.4% 2.10 2.10 -0.2%

On Grace

Problem Size: Small (S)
Benchmark GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
matrix_vector 19.72 21.25 +7.8% 20.10 20.63 +2.6% 19.04 21.44 +12.6%
Problem Size: Extra Small (XS)
Benchmark GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
atv 3.09 3.41 +10.6% 2.57 3.23 +25.7% 2.14 2.51 +17.5%
matrix_vector 3.07 3.52 +14.6% 3.13 3.68 +17.6% 3.05 3.55 +16.4%
syr2 2.67 2.76 +3.2% 2.20 2.23 +1.4% 2.37 1.98 -16.5% ⚠️
Summary: Geometric Mean Across All Problem Sizes
Problem Size GNU Base GNU Inline Speedup NV Base NV Inline Speedup CLANG Base CLANG Inline Speedup
XL 38.95 38.92 -0.1% 39.67 39.86 +0.5% 39.50 39.65 +0.4%
L 39.14 39.43 +0.7% 40.05 40.06 +0.0% 40.21 40.21 +0.0%
M 32.12 32.37 +0.8% 35.01 35.01 -0.0% 34.82 34.88 +0.2%
S 14.36 14.52 +1.1% 15.03 15.00 -0.2% 13.84 14.02 +1.3%
XS 2.80 2.87 +2.5% 2.82 2.89 +2.3% 2.49 2.50 +0.3%

Merge request reports

Loading