Segmentation fault on PPC in gemm_extra_cols
Summary
When using TensorFlow on PPC9LE I get a crash in ArithmeticOptimizerTest.ReplaceMulWithSquare.
The backtrace isn't really helpful as most functions got inlined. The crash happens in Eigen::internal::gemm_extra_cols and the second-last entry is tensorflow::SendOp::SendOp(tensorflow::OpKernelConstruction*).
Environment
- Operating System : Linux
- Architecture : PowerPC
- Eigen Version : 3bb6a48d
- Compiler Version : GCC 11.3.0
- Compile Flags : -O3 -mcpu=native
- Vector Extension : Altivec
- TensorFlow 2.11.0
Steps to reproduce
Sorry nothing small to reproduce, I ran into this when testing //tensorflow/core/grappler/optimizers:arithmetic_optimizer_test_cpu
Relevant logs
Program received signal SIGSEGV, Segmentation fault.
0x0000000012152bd4 in void Eigen::internal::gemm_extra_cols<double, double __vector(2), Eigen::internal::blas_data_mapper<double, long, 0, 0, 1>, 2l>(Eigen::internal::blas_data_mapper<double, long, 0, 0, 1> const&, double const*, double const*, long, long, long, long, long, long, long, long, long, double __vector(2) const&, double __vector(2) const&) [clone .isra.0] ()
(gdb) bt
#0 0x0000000012152bd4 in void Eigen::internal::gemm_extra_cols<double, double __vector(2), Eigen::internal::blas_data_mapper<double, long, 0, 0, 1>, 2l>(Eigen::internal::blas_data_mapper<double, long, 0, 0, 1> const&, double const*, double const*, long, long, long, long, long, long, long, long, long, double __vector(2) const&, double __vector(2) const&) [clone .isra.0] ()
#1 0x00000000136d5344 in tensorflow::SendOp::SendOp(tensorflow::OpKernelConstruction*) ()
#2 0x00000000136d585c in tensorflow::register_kernel_0::{lambda(tensorflow::KernelDef const*)#1}::operator()(tensorflow::KernelDef const) const::{lambda(tensorflow::OpKernelConstruction*)#1}::_FUN ()
#3 0x0000200001307b38 in tensorflow::CreateOpKernel(tsl::DeviceType, tensorflow::DeviceBase*, tsl::Allocator*, tensorflow::FunctionLibraryRuntime*, tensorflow::ResourceMgr*, std::shared_ptr<tensorflow::NodeProperties const> const&, int, tensorflow::OpKernel**) ()
Anything else that might help
I suspect an ODR violation to be the issue. gemm_extra_cols is called from 2 locations and if I remove the (in this case unused) call from inside MatrixProductMMA.h the test passes.
-
Have a plan to fix this issue.
In an experiment I added an extra template parameter to gemm_extra_cols to get a distinct version in MatrixProductMMA.h which fixed that without removing the call or any other expected side effects (except the increased code size)
Maybe the same is required for the called gemm_cols
Note that something similar is done in gemmMMA: Instead of gemm_cols (as in gemm) gemmMMA_cols is called. So the issue occurs only for the remainder part in that function. Other functions might be also affected but haven't experienced any other crashes yet.