Change inline hint for general_matrix_vector_product<>::run() to gain performance
Description
Change inline hint for general_matrix_vector_product<>::run() to always_inline to gain performance
This is currently with EIGEN_DONT_INLINE. Switching to EIGEN_ALWAYS_INLINE led to performance gains
in bench_btl benchmarks that run matrix vector products for small problem sizes.
Reference issue
Additional information
Ran performance benchmarks based on selected problem sizes. For the 4 categories of benchmarks under bench/btl:
- linear: axpby, axpy, rot
- vecmat: matrix_vector, atv, symv, syr2, ger
- matmat: matrix_matrix, ata
- adv: trisolve_vector, trisolve_matrix, cholesky, partial_lu_decomp, tridiagonalization
I have these 5 problem sizes:
| Benchmark | XL | L | M | S | XS |
|---|---|---|---|---|---|
| linear | 2999999 | 101942 | 3464 | 117 | 4 |
| matmat | 4999 | 840 | 141 | 23 | 4 |
| vecmat | 4999 | 840 | 141 | 23 | 4 |
| adv | 3000 | 681 | 154 | 35 | 8 |
Tested with GNU 15.1.0, NVHPC 25.9, and clang 20.1.0.
This source change led to significant speedups in some of the vecmat tests for S and XS problem sizes for all three compilers tested. There are some slowdowns in a few tests, but the geo-mean across all benchmarks for each problem size either has a speedup or has no change.
Below is a summary of tests with >10% speedups or slowdowns, as well as the geomean for each problem size, on both x86 (Genoa) and ARM (Grace). There is a bigger impact on Genoa.
On Genoa
Problem Size: Medium (M)
| Benchmark | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| trisolve_vector | 16.76 | 17.71 | +5.6% | 12.62 | 14.14 | +12.0% ✓ | 16.10 | 17.23 | +7.1% |
Problem Size: Small (S)
| Benchmark | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| atv | 12.56 | 14.31 | +13.9% ✓ | 14.22 | 15.42 | +8.4% | 11.67 | 12.07 | +3.4% |
| axpy | 35.47 | 35.82 | +1.0% | 23.23 | 27.36 | +17.8% ✓ | 24.89 | 24.69 | -0.8% |
| matrix_vector | 16.97 | 18.37 | +8.2% | 15.11 | 17.21 | +13.9% ✓ | 13.55 | 13.09 | -3.4% |
Problem Size: Extra Small (XS)
| Benchmark | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| atv | 2.07 | 2.07 | -0.0% | 1.71 | 1.90 | +11.3% ✓ | 1.51 | 1.71 | +13.0% ✓ |
| cholesky | 1.82 | 2.50 | +37.2% ✓ | 1.79 | 1.73 | -3.2% | 1.85 | 1.74 |
-6.1% |
| ger | 2.28 | 2.27 | -0.4% | 2.20 | 2.19 | -0.5% | 2.00 | 2.25 | +12.7% ✓ |
| matrix_vector | 2.31 | 2.68 | +15.9% ✓ | 2.67 | 2.78 | +4.0% | 2.51 | 2.81 | +11.9% ✓ |
| trisolve_matrix | 3.36 | 2.41 |
-28.3% |
2.35 | 2.58 | +9.9% | 3.58 | 3.54 | -1.1% |
| trisolve_vector | 1.23 | 1.22 | -0.7% | 0.97 | 0.84 |
-13.7% |
1.26 | 1.02 |
-18.6% |
Summary: Geometric Mean Across All Problem Sizes
| Problem Size | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| XL | 52.25 | 52.25 | -0.0% | 25.59 | 25.67 | +0.3% | 52.10 | 52.04 | -0.1% |
| L | 59.03 | 59.20 | +0.3% | 29.22 | 29.48 | +0.9% | 58.00 | 57.95 | -0.1% |
| M | 41.90 | 42.12 | +0.5% | 21.40 | 21.65 | +1.2% | 40.38 | 40.62 | +0.6% |
| S | 14.17 | 14.46 | +2.0% | 8.49 | 8.77 | +3.3% | 12.70 | 12.77 | +0.6% |
| XS | 2.34 | 2.36 | +0.8% | 2.03 | 2.04 | +0.4% | 2.10 | 2.10 | -0.2% |
On Grace
Problem Size: Small (S)
| Benchmark | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| matrix_vector | 19.72 | 21.25 | +7.8% | 20.10 | 20.63 | +2.6% | 19.04 | 21.44 | +12.6% ✓ |
Problem Size: Extra Small (XS)
| Benchmark | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| atv | 3.09 | 3.41 | +10.6% ✓ | 2.57 | 3.23 | +25.7% ✓ | 2.14 | 2.51 | +17.5% ✓ |
| matrix_vector | 3.07 | 3.52 | +14.6% ✓ | 3.13 | 3.68 | +17.6% ✓ | 3.05 | 3.55 | +16.4% ✓ |
| syr2 | 2.67 | 2.76 | +3.2% | 2.20 | 2.23 | +1.4% | 2.37 | 1.98 |
-16.5% |
Summary: Geometric Mean Across All Problem Sizes
| Problem Size | GNU Base | GNU Inline | Speedup | NV Base | NV Inline | Speedup | CLANG Base | CLANG Inline | Speedup |
|---|---|---|---|---|---|---|---|---|---|
| XL | 38.95 | 38.92 | -0.1% | 39.67 | 39.86 | +0.5% | 39.50 | 39.65 | +0.4% |
| L | 39.14 | 39.43 | +0.7% | 40.05 | 40.06 | +0.0% | 40.21 | 40.21 | +0.0% |
| M | 32.12 | 32.37 | +0.8% | 35.01 | 35.01 | -0.0% | 34.82 | 34.88 | +0.2% |
| S | 14.36 | 14.52 | +1.1% | 15.03 | 15.00 | -0.2% | 13.84 | 14.02 | +1.3% |
| XS | 2.80 | 2.87 | +2.5% | 2.82 | 2.89 | +2.3% | 2.49 | 2.50 | +0.3% |