.eval() of prvalue makes unnecessary copy?

Consider the following benchmark:

#include <benchmark/benchmark.h>
#include <Eigen/Eigen>

Eigen::Matrix3d m = Eigen::Matrix3d::Random();

Eigen::Matrix3d ReturnMatrix3d() { return m * m; }

static void BM_return_matrix(benchmark::State& state) {
  for (auto _ : state) {
    Eigen::Matrix3d z = ReturnMatrix3d();
    benchmark::DoNotOptimize(z);
  }
}
BENCHMARK(BM_return_matrix);

static void BM_return_matrix_eval(benchmark::State& state) {
  for (auto _ : state) {
    Eigen::Matrix3d z = ReturnMatrix3d().eval();
    benchmark::DoNotOptimize(z);
  }
}
BENCHMARK(BM_return_matrix_eval);

BENCHMARK_MAIN();

I run this on my linux machine with g++-10 in release mode:

----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_return_matrix            4.80 ns         4.80 ns    145833807
BM_return_matrix_eval       8.07 ns         8.07 ns     86733733

We can see that the second version (ReturnMatrix3d().eval()) takes almost double the time of just calling ReturnMatrix3d().

I'm not a c++ expert, but couldn't eval() deduce that ReturnMatrix3d() is a prvalue of type Eigen::Matrix and just return a r-value reference to it instead of making a copy?

Motivation

In this simple example this doesn't really pose a problem. A more realistic real word scenario where this matters, is if you have two functions

Eigen::Matrix foo(int);
const Eigen::CwiseBinaryOp<...> foo(double);

and then you have some template code that will either select foo(int) or foo(double). Because the template code doesn't know whether it will retrieve an expression template or a Eigen::Matrix, it calls eval() to get an Eigen::Matrix in any case.

Windows

Interestingly the two versions are equivalent on my Windows Machine (Visual Studio 16.8, Release mode):

----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_return_matrix            32.5 ns         33.0 ns     20363636
BM_return_matrix_eval       32.8 ns         32.1 ns     19478261
Edited by raffael