BDCSVD: segmentation faults for some matrices
Submitted by David Aceituno
Assigned to Nobody
Link to original bugzilla bug (#1723)
Version: 3.3 (current stable)
Operating system: Linux
Description
Created attachment 944
Matrix in binary format
BDCSVD fails with an assertion error (in Debug mode) or segmentation fault (in Release mode, -DNDEBUG), when using Intel MKL on a Xeon Gold 6130 machine with flag -mavx.
I am using Eigen version 3.3.7.
This is the GDB backtrace:
#0 0x0000000001b0b7fb in raise ()
#1 0x0000000001bd8b38 in abort ()
#2 0x0000000001bd26c4 in __assert_fail_base ()
#3 0x0000000001bd271e in __assert_fail ()
#4 0x000000000043d57c in Eigen::DenseCoeffsBase<Eigen::Ref<Eigen::Array<long, 1, -1, 1, 1, -1>, 0, Eigen::InnerStride<1> >, 0>::operator() (this=0x7fffffff6890, index=-1)
at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/Core/DenseCoeffsBase.h:180
#5 0x0000000000437a4a in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::perturbCol0 (this=0x7fffffff8660, col0=..., diag=..., perm=..., singVals=..., shifts=..., mus=..., zhat=...)
at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:924
#6 0x000000000042d632 in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::computeSVDofM (this=0x7fffffff8660, firstCol=0, n=46, U=..., singVals=..., V=...)
at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:638
#7 0x00000000004258cb in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::divide (this=0x7fffffff8660, firstCol=0, lastCol=45, firstRowW=0, firstColW=0, shift=0)
at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:534
#8 0x000000000041f51a in Eigen::BDCSVD<Eigen::Matrix<std::complex<double>, -1, -1, 0, -1, -1> >::compute (this=0x7fffffff8660, matrix=..., computationOptions=40)
at /home/x_davac/.conda/envs/dmrg/include/eigen3/Eigen/src/SVD/BDCSVD.h:278
#9 0x00000000004028fc in main () at /home/x_davac/svd_bug/main.cpp:53
In line 924 of eigen3/Eigen/src/SVD/BDCSVD.h, an index is out of bounds because l == 0:
Index j = i<k ? i : perm(l-1);
I have tried to reproduce the error on these machines (from /proc/cpuinfo):
- Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz (Ubuntu 19.04) ---> Success always
- Intel(R) Xeon(R) CPU E5-1660 v4 @ 3.20GHz (Ubuntu 16.04) ---> Success always
- Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz (CentOS 7.6) ---> Failure with mkl + vectorization
The error disappears in the following cases:
- Edit line line 95 of eigen3/Eigen/src/SVD/BDCSVD.h
from typedef Matrix<Scalar, Dynamic, Dynamic, ColMajor> MatrixX;
into typedef Matrix<Scalar, Dynamic, Dynamic, ColMajor | DontAlign> MatrixX;
- Remove define EIGEN_USE_BLAS (or EIGEN_USE_MKL_ALL)
- Remove -mavx (or compile with -march=nehalem or older)
The error remains when changing the following:
- GNU C++ 7.3.0 <-> Clang 6.0.1
- Intel MKL 2018 <-> Intel MKL 2019 v4
- -std=c++17 <-> (none)
A minimal code to reproduce the error can be found here: https://github.com/DavidAce/svd_bug
The CMake project should compile fine if your MKL installation is in a standard path (opt/intel/mkl, $HOME/intel/mkl, etc), defined in MKL_ROOT or in LD_LIBRARY_PATH.
The same matrix is attached in binary format as well as hardcoded into the source file main.cpp.
Weirdly enough, the hardcoded one succeeds, presumably the error is sensitive to the precision of the numbers somehow.
To read the attached binary file, use:
template<typename Derived>
void read(const char* filename, Eigen::MatrixBase<Derived>& matrix){
std::ifstream in(filename, std::ios::in | std::ios::binary);
typename Derived::Index rows=0, cols=0;
in.read((char*) (&rows),sizeof(typename Derived::Index));
in.read((char*) (&cols),sizeof(typename Derived::Index));
matrix.derived().resize(rows, cols);
in.read( (char *) matrix.derived().data() , rows*cols*sizeof(typename Derived::Scalar) );
in.close();
}
as shown in main.cpp in the github link above.
I can provide more matrices that fail if necessary.
Attachment 944, "Matrix in binary format":
svd_binary_real.bin