Mixed-precision row-major SpMV causes allocation
Summary
Performing a mixed-precision sparse matrix dense vector product y = A*x
,
using scalar type float
for A
and x
as well as double
for y
,
causes an allocation when casting the x
vector.
The allocation does not happen when only A
needs to be casted.
Environment
- Operating System : Ubuntu 20.04
- Architecture : x64
- Eigen Version : 3.3.9, 3.4.0, master (0488b708)
- Compiler Version : Gcc9.4.0
- Compile Flags : -g
- Vector Extension : irrelevant
Minimal Example
# CMakeLists.txt
cmake_minimum_required(VERSION 3.15)
project(mixedprecision)
set(CMAKE_CXX_STANDARD 17)
find_package(Eigen3 REQUIRED)
set(EIGEN_ColMajor 0)
set(EIGEN_RowMajor 0x1)
set(f32 float)
set(f64 double)
enable_testing()
function(add_main LAYOUT A_BITS X_BITS)
set(MAIN main_${LAYOUT}_A${A_BITS}_x${X_BITS})
add_executable(${MAIN} main.cpp)
target_link_libraries(${MAIN} Eigen3::Eigen)
target_compile_definitions(
${MAIN} PRIVATE
SCALAR_A=${f${A_BITS}}
SCALAR_X=${f${X_BITS}}
LAYOUT=${EIGEN_${LAYOUT}}
)
add_test(NAME ${MAIN} COMMAND ${MAIN})
endfunction()
foreach(LAYOUT IN ITEMS ColMajor RowMajor)
foreach(A_BITS IN ITEMS 32 64)
foreach(X_BITS IN ITEMS 32 64)
add_main(${LAYOUT} ${A_BITS} ${X_BITS})
endforeach()
endforeach()
endforeach()
// main.cpp
#include<iostream>
#define EIGEN_RUNTIME_NO_MALLOC
#include<Eigen/SparseCore>
using namespace Eigen;
using ScalarA = SCALAR_A;
using ScalarX = SCALAR_X;
using ScalarY = double;
constexpr auto Layout = LAYOUT;
int main(void) {
printf("Eigen %d.%d.%d\n", EIGEN_WORLD_VERSION, EIGEN_MAJOR_VERSION, EIGEN_MINOR_VERSION);
const size_t dim = 3;
auto A = SparseMatrix<ScalarA,Layout>(dim, dim);
auto x = Matrix<ScalarX, Dynamic, 1>(dim);
auto y = Matrix<ScalarY, Dynamic, 1>(dim);
for (size_t i = 0; i < dim; ++i)
A.insert(i, i) = 1;
A.makeCompressed();
x << 1, 2, 3;
Eigen::internal::set_is_malloc_allowed(false);
y.noalias() = A.cast<ScalarY>() * x.cast<ScalarY>();
Eigen::internal::set_is_malloc_allowed(true);
std::cout << y << std::endl;
}
Steps to reproduce
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j
ctest
What is the current bug behavior?
$ ctest
Test project /home/jschulze/tmp/eigen#xxx/build
Start 1: main_ColMajor_A32_x32
1/8 Test #1: main_ColMajor_A32_x32 ............ Passed 0.00 sec
Start 2: main_ColMajor_A32_x64
2/8 Test #2: main_ColMajor_A32_x64 ............ Passed 0.00 sec
Start 3: main_ColMajor_A64_x32
3/8 Test #3: main_ColMajor_A64_x32 ............ Passed 0.00 sec
Start 4: main_ColMajor_A64_x64
4/8 Test #4: main_ColMajor_A64_x64 ............ Passed 0.00 sec
Start 5: main_RowMajor_A32_x32
5/8 Test #5: main_RowMajor_A32_x32 ............Child aborted***Exception: 0.11 sec
Start 6: main_RowMajor_A32_x64
6/8 Test #6: main_RowMajor_A32_x64 ............ Passed 0.00 sec
Start 7: main_RowMajor_A64_x32
7/8 Test #7: main_RowMajor_A64_x32 ............Child aborted***Exception: 0.08 sec
Start 8: main_RowMajor_A64_x64
8/8 Test #8: main_RowMajor_A64_x64 ............ Passed 0.00 sec
75% tests passed, 2 tests failed out of 8
Total Test time (real) = 0.20 sec
The following tests FAILED:
5 - main_RowMajor_A32_x32 (Child aborted)
7 - main_RowMajor_A64_x32 (Child aborted)
Errors while running CTest
Both fail with the same error:
$ ctest --rerun-failed --output-on-failure
Test project /home/jschulze/tmp/eigen#xxx/build
Start 5: main_RowMajor_A32_x32
1/2 Test #5: main_RowMajor_A32_x32 ............Child aborted***Exception: 0.10 sec
main_RowMajor_A32_x32: /home/jschulze/.local/include/eigen3/Eigen/src/Core/util/Memory.h:164: void Eigen::internal::check_that_malloc_is_allowed(): Assertion `is_malloc_allowed() && "heap allocation is forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is false)"' failed.
Start 7: main_RowMajor_A64_x32
2/2 Test #7: main_RowMajor_A64_x32 ............Child aborted***Exception: 0.08 sec
main_RowMajor_A64_x32: /home/jschulze/.local/include/eigen3/Eigen/src/Core/util/Memory.h:164: void Eigen::internal::check_that_malloc_is_allowed(): Assertion `is_malloc_allowed() && "heap allocation is forbidden (EIGEN_RUNTIME_NO_MALLOC is defined and g_is_malloc_allowed is false)"' failed.
0% tests passed, 2 tests failed out of 2
Total Test time (real) = 0.18 sec
The following tests FAILED:
5 - main_RowMajor_A32_x32 (Child aborted)
7 - main_RowMajor_A64_x32 (Child aborted)
Errors while running CTest
What is the expected correct behavior?
All tests pass; no allocation occurs in any of the configurations.
Edited by Jonas Schulze