Copies (& potentially moves?) of Eigen object with large unused MaxRows/ColAtCompileTime are slow (Regression from Eigen 3.2)
Summary
Copies of Eigen object with unused in-line space are slow (as unused data is also copied). This wasn't the case in Eigen 3.2.
For example, copying Eigen::Matrix<double, Dynamic, 1, 0, 600, 1> naively copies the whole 600 double storage even for the not entirely uncommon case of only fewer values (e.g. 10) being used.
Environment
- Operating System : macOS 11.2.3 "Big Sur"
- Architecture : x86_64
- Eigen Version : 3.4rc1
- Compiler Version : clang 11.1.0 (homebrew)
-
Compile Flags :
-std=c++17 -Ofast -march=native -DNDEBUG
- Vector Extension : Up to AVX512
Minimal Example
Example copying 4800 bytes for one double
#include "Eigen/Dense"
using T = Eigen::Matrix<double, Eigen::Dynamic, 1, 0, 600>;
T testCopy(const T &t)
{
return t;
}
T Trigger()
{
T a;
a.resize(1);
a[0] = 42.0;
return testCopy(a);
}
Steps to reproduce
- Compile the example with Eigen 3.3 or 3.4
- Notice the large copy of the whole storage array
- Compile with Eigen 3.2 and note the improvement.
What is the current bug behavior?
All (even unused (and possibly uninitialized?)) data is copied when copying such an Eigen object.
What is the expected correct behavior?
Only the used data should be copied.
Anything else that might help
- It seems that PlainObjectBase's copy-constructor now calls the copy-constructor of plain_storage (?) which copies the whole array. The old implementation called
lazyAssign
instead. Modifying Eigen 3.4rc1 to use
/** Copy constructor */
EIGEN_DEVICE_FUNC
EIGEN_STRONG_INLINE PlainObjectBase(const PlainObjectBase& other)
: Base(), m_storage()
{
resizeLike(other);
_set_noalias(other);
}
speeds my test-case up by a factor of roughly 4.
- Weirdly (on macOS)
void Eigen::internal::pstore<double, double vector[8]>(double*, double vector[8] const&)
seems to call_platform_memmove$VARIANT$Haswell
or_platform_bzero$VARIANT$Haswell
instead of using the expected SSE/AVX/... instructions - This problem was very visible when using the unsupported AutoDiffScalar with such a derivative storage type (as they tend to be copied a lot).