Skip to content

wip: Play with google benchmark and motion models

This is not an MR that is supposed to be merged directly. It is just here to facilitate a discussion.

The main discussion point here is which role do which function implementations play on performance.

When running the simple_banchmark binary generated from test/bench_constant_velocity.cpp the result on my machine is the following:

ade$ ./build/motion_model/simple_benchmark 
2021-02-24 17:01:02
Running ./build/motion_model/simple_benchmark
Run on (12 X 4500 MHz CPU s)
CPU Caches:
  L1 Data 32K (x6)
  L1 Instruction 32K (x6)
  L2 Unified 256K (x6)
  L3 Unified 12288K (x1)
Load Average: 0.66, 0.71, 0.71
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations
------------------------------------------------------------------------------
ConstantVelocitySingleAllocation          7.49 ns         7.49 ns     92107893
ConstantVelocityMultipleAllocations       35.9 ns         35.9 ns     19429404
ConstantVelocityNewCopy                    337 ns          337 ns      2063007
ConstantVelocityNewCreate                 15.8 ns         15.8 ns     43617767
ConstantVelocityNewCopyInPlace             320 ns          320 ns      2196998
ConstantVelocityNewCreateInPlace          7.44 ns         7.44 ns     92043795

I believe that what we see here is the following:

  • ConstantVelocitySingleAllocation uses the signature void compute_jacobian(Matrix & F, const Nanosecs & dt) and gets called with the matrix F that is created only once, thus no copy and no move are happening.
  • ConstantVelocityMultipleAllocations same as above, but matrix F is created every time before calling the function. I don't fully understand what happens here without looking at disassembly.
  • ConstantVelocityNewCopy uses the signature Matrix compute_jacobian(const std::chrono::nanoseconds & dt), creates internal jacobean matrix and when called assigns it new values and returns it to be assigned to the pre-existing matrix F, which seems to trigger a copy.
  • ConstantVelocityNewCreate uses the signature Matrix compute_jacobian(const std::chrono::nanoseconds & dt) and assigns to a new matrix F, don't understand what is happening here.
  • ConstantVelocityNewCopyInPlace uses the signature Matrix compute_jacobian(const std::chrono::nanoseconds & dt), creates matrix inside of the function and assigns its return value to an existing matrix F, Seems to copy.
  • ConstantVelocityNewCreateInPlace uses the signature Matrix compute_jacobian(const std::chrono::nanoseconds & dt), creates matrix inside of the function and assigns its return value to a new matrix F, seems like RVO/NRVO kicks in, so no copy or move is happening.
Edited by Igor Bogoslavskyi

Merge request reports

Loading