Fix RowMajor performance for triangular/dense assignment (!2165) · Merge requests · libeigen / eigen

Summary

Fixes #3031 (closed)
Rewrite the dynamic triangular_assignment_loop to iterate in storage order (outer/inner matching layout) instead of always iterating outer=col, inner=row
This gives contiguous memory access for both ColMajor and RowMajor storage, fixing a 5-137x RowMajor performance deficit while maintaining ColMajor parity
Use compile-time constexpr row()/col() helpers that constant-fold to zero overhead
Keep simple scalar loops so GCC recognizes memcpy/memset idioms and Clang auto-vectorizes

RowMajor Triangular2Dense (GCC Haswell):

Size	OLD	NEW	Speedup
64	1091 ns	199 ns	5.5x
256	206,778 ns	2,423 ns	85x
1024	7,830,936 ns	57,028 ns	137x

ColMajor - near-parity across all configs (0.93x-1.36x, median ~1.00x).

Generated with Claude Code

Edited Feb 20, 2026 by Rasmus Munk Larsen