Skip to content

Remove AVX512VL dependency in trsm

Reference issue

What does this implement/fix?

This PR addresses the issue mentioned in !959 (merged). The _mm256_mask* instrinsics are not supported in AVX512F (-mfma -avx512f) and requires AVX512F + AVX512VL. To fix this we switch to corresponding _mm512_mask* intrinsics and reinterpret zmm <-> ymm when necessary.

Additional information

In !834 (merged) -march=native was used for performance testing. With -march=native the changes here do not cause any performance regressions. With -mfma -mavx512f performance is lower for smaller problem sizes in cases requiring intermediate transposes.

Merge request reports

Loading