ASAN fixes for AVX512 GEMM/TRSM
Reference issue
Apologies for the long delay! This is a follow-up to !1067 (closed). This MR addresses some memory related issues in the AVX512 GEMM/TRSM kernels detected via address sanitizer.
What does this implement/fix?
For GEMM the fix implemented is mentioned here. The buffer overrun comes from the A matrix pre-loads in the kloop. The fix is to split the k loop into two sections (k = k_ + kRem). Pre-loads are disabled when handling kRem.
For TRSM, masked loads were added to aux_loadB. For certain remainder cases we were loading out-of-bound data.
Additional information
As far as I can tell, there are no significant performance impact with the changes in the gemm kernels.