Bfloat16 GEMM/GEMV support via OpenBLAS Backend
Describe the feature you would like to be implemented.
OpenBLAS SBGEMM, SBGEMV support in Eigen to support Bfloat16 Matrix multiplications
Would such a feature be useful for other users? Why?
Above mentioned level 3 BLAS routines can perform Matrix multiplication on bfloat16 inputs. SBGEMM returns float32 output. Adding this support will improve BLAS backend coverage and Eigen BLAS performance on aarch64/x86 backend for bfloat16 data type.
Any hints on how to implement the requested feature?
SGEMV Reference in Eigen BLAS APIS : https://gitlab.com/libeigen/eigen/-/blob/master/blas/blas.h#L170 Similar addition is required for above mentioned blas routines. SBGEMM OpenBLAS Interface : https://github.com/OpenMathLib/OpenBLAS/blob/develop/common_interface.h#L484
Additional resources
-
On AArch64, OpenBLAS includes logic within its GEMM implementation to internally redirect execution to the GEMV kernel when the operation characteristics suggest that a vector-matrix multiplication would be more efficient. So Eigen might only need integration with SBGEMM but please verify.
-
https://github.com/OpenMathLib/OpenBLAS/pull/5287, https://github.com/OpenMathLib/OpenBLAS/issues/5155 BGEMM and BGEMV routines are under active development in OpenBLAS for aarch64 backend. These routines perform matrix multiplication on bfloat16 input and returns bfloat16 output. Adding these routines to Eigen in future would be good as well.
CC: @cantonios