Performance regression with `arma::inv`
Hi @conradsnicta,
I am working on a non-linear optimisation task which involves calculating the inverse of 3x3 square matrices several hundred million times. To date we have been happily using Armadillo 10.5.1; we recently updated to Armadillo 14.4.1, and experienced a substantial slow-down of our program. The slowdown doesn't seem to be directly due to any changes in arma::inv
, but rather in how arma::inv
is used within an expression.
The affected code looks something like this:
std::vector<arma::fmat> in1(...);
std::vector<arma::fmat> in2(...);
std::vector<arma::fmat> out(...);
for (int i = 0; i < in1.size(); i++) {
out[i] = in1[i].i() * in2[i];
}
If I adjust the code to use a temporary variable, performance is restored to that of 10.5.1:
for (int i = 0; i < in1.size(); i++) {
arma::fmat tmp = in1[i].i();
out[i] = tmp * in2[i];
}
Below is a simple example which can be used to demonstrate the slowdown. I ran this against a few armadillo versions - it looks like the slowdown was introduced around armadillo 11.0.0:
- armadillo 10.5.1:
ARMADILLO VERSION: 10.5.1
Matrix creation: 6.173 seconds
Using i(): 0.059 seconds
Using arma::inv: 0.058 seconds
Using temporary variable: 0.034 seconds
- armadillo 10.8.2:
ARMADILLO VERSION: 10.8.2
Matrix creation: 6.124 seconds
Using i(): 0.059 seconds
Using arma::inv: 0.057 seconds
Using temporary variable: 0.034 seconds
- armadillo 11.0.0 (note that using a temporary variable is now slower):
ARMADILLO VERSION: 11.0.0
Matrix creation: 6.12 seconds
Using i(): 0.261 seconds
Using arma::inv: 0.267 seconds
Using temporary variable: 0.245 seconds
- armadillo 11.4.4:
ARMADILLO VERSION: 11.4.4
Matrix creation: 6.122 seconds
Using i(): 0.259 seconds
Using arma::inv: 0.258 seconds
Using temporary variable: 0.246 seconds
- armadillo 14.2.1 (using temporary variable is now faster):
ARMADILLO VERSION: 14.2.1
Matrix creation: 0.241 seconds
Using i(): 0.258 seconds
Using arma::inv: 0.255 seconds
Using temporary variable: 0.039 seconds
- armadillo 14.4.1:
ARMADILLO VERSION: 14.4.1
Matrix creation: 0.238 seconds
Using i(): 0.257 seconds
Using arma::inv: 0.255 seconds
Using temporary variable: 0.04 seconds
For our application, we may end up using hand-rolled inv33
and det33
routines, (e.g. the equivalent of calling the op_inv_gen_meat.hpp:apply_tiny_3x3
routine directly). But I felt that this apparent performance regression was worth bringing to your attention. Thanks very much for all of your hard work on this amazing library!
#include <vector>
#include <chrono>
#include <armadillo>
class Timer {
private:
std::string _label;
std::chrono::time_point<std::chrono::system_clock> _start;
public:
Timer(std::string label)
: _label(label),
_start(std::chrono::system_clock::now())
{}
~Timer() {
auto end = std::chrono::system_clock::now();
auto dur = std::chrono::duration_cast<std::chrono::milliseconds>(end - _start);
auto elapsed = dur.count() / 1000.0;
std::cout << _label << ": " << elapsed << " seconds" << std::endl;
};
};
arma::fmat random_matrix(int size) {
arma::fmat mat(size, size);
mat.randu();
return mat;
}
arma::fmat random_invertible_matrix(int size) {
arma::fmat mat(size, size);
mat.randu();
mat += arma::eye<arma::fmat>(3, 3);
return mat;
}
int main(int argc, char* argv[]) {
std::cout << "ARMADILLO VERSION: "
<< ARMA_VERSION_MAJOR << "."
<< ARMA_VERSION_MINOR << "."
<< ARMA_VERSION_PATCH << std::endl;
int N = 1000000;
std::vector<arma::fmat> in1(N);
std::vector<arma::fmat> in2(N);
std::vector<arma::fmat> out(N);
{
Timer t("Matrix creation");
for (auto& m : in1) m = random_invertible_matrix(3);
for (auto& m : in2) m = random_matrix(3);
}
{
Timer t("Using i()");
for (int i = 0; i < in1.size(); i++) {
out[i] = in1[i].i() * in2[i];
}
}
{
Timer t("Using arma::inv");
for (int i = 0; i < in1.size(); i++) {
out[i] = arma::inv(in1[i]) * in2[i];
}
}
{
Timer t("Using temporary variable");
for (int i = 0; i < in1.size(); i++) {
arma::fmat tmp = in1[i].i();
out[i] = tmp * in2[i];
}
}
}