ASAN fixes for AVX512 GEMM/TRSM

Reference issue

Apologies for the long delay! This is a follow-up to !1067 (closed). This MR addresses some memory related issues in the AVX512 GEMM/TRSM kernels detected via address sanitizer.

What does this implement/fix?

For GEMM the fix implemented is mentioned here. The buffer overrun comes from the A matrix pre-loads in the kloop. The fix is to split the k loop into two sections (k = k_ + kRem). Pre-loads are disabled when handling kRem.

For TRSM, masked loads were added to aux_loadB. For certain remainder cases we were loading out-of-bound data.

Additional information

As far as I can tell, there are no significant performance impact with the changes in the gemm kernels.

Merge request reports

Loading