Skip to content

New panel modes for GEMM MMA (real & complex).

New panel modes for GEMM MMA (real & complex). Better register usage and pipeline.

Up to 2.84X faster for small matrices. 34% faster for F32 MMA real-only, 75% for F64 MMA real-only - large matrices. 48% faster for F32 MMA complex, 32% for F64 MMA complex - large matrices. Up to 20% better performance for packing.

Some other fixes for various compilers.

Merge request reports

Loading