Skip to content

Performance regression with `arma::inv`

Hi @conradsnicta,

I am working on a non-linear optimisation task which involves calculating the inverse of 3x3 square matrices several hundred million times. To date we have been happily using Armadillo 10.5.1; we recently updated to Armadillo 14.4.1, and experienced a substantial slow-down of our program. The slowdown doesn't seem to be directly due to any changes in arma::inv, but rather in how arma::inv is used within an expression.

The affected code looks something like this:

std::vector<arma::fmat> in1(...);
std::vector<arma::fmat> in2(...);
std::vector<arma::fmat> out(...);

for (int i = 0; i < in1.size(); i++) {
  out[i] = in1[i].i() * in2[i];
}

If I adjust the code to use a temporary variable, performance is restored to that of 10.5.1:

for (int i = 0; i < in1.size(); i++) {
  arma::fmat tmp = in1[i].i();
  out[i] = tmp * in2[i];
}

Below is a simple example which can be used to demonstrate the slowdown. I ran this against a few armadillo versions - it looks like the slowdown was introduced around armadillo 11.0.0:

  • armadillo 10.5.1:
ARMADILLO VERSION: 10.5.1
Matrix creation: 6.173 seconds
Using i(): 0.059 seconds
Using arma::inv: 0.058 seconds
Using temporary variable: 0.034 seconds
  • armadillo 10.8.2:
ARMADILLO VERSION: 10.8.2
Matrix creation: 6.124 seconds
Using i(): 0.059 seconds
Using arma::inv: 0.057 seconds
Using temporary variable: 0.034 seconds
  • armadillo 11.0.0 (note that using a temporary variable is now slower):
ARMADILLO VERSION: 11.0.0
Matrix creation: 6.12 seconds
Using i(): 0.261 seconds
Using arma::inv: 0.267 seconds
Using temporary variable: 0.245 seconds
  • armadillo 11.4.4:
ARMADILLO VERSION: 11.4.4
Matrix creation: 6.122 seconds
Using i(): 0.259 seconds
Using arma::inv: 0.258 seconds
Using temporary variable: 0.246 seconds
  • armadillo 14.2.1 (using temporary variable is now faster):
ARMADILLO VERSION: 14.2.1
Matrix creation: 0.241 seconds
Using i(): 0.258 seconds
Using arma::inv: 0.255 seconds
Using temporary variable: 0.039 seconds
  • armadillo 14.4.1:
ARMADILLO VERSION: 14.4.1
Matrix creation: 0.238 seconds
Using i(): 0.257 seconds
Using arma::inv: 0.255 seconds
Using temporary variable: 0.04 seconds

For our application, we may end up using hand-rolled inv33 and det33 routines, (e.g. the equivalent of calling the op_inv_gen_meat.hpp:apply_tiny_3x3 routine directly). But I felt that this apparent performance regression was worth bringing to your attention. Thanks very much for all of your hard work on this amazing library!

#include <vector>
#include <chrono>

#include <armadillo>

class Timer {
private:
  std::string                                        _label;
  std::chrono::time_point<std::chrono::system_clock> _start;

public:
  Timer(std::string label)
    : _label(label),
      _start(std::chrono::system_clock::now())
  {}

  ~Timer() {
    auto end     = std::chrono::system_clock::now();
    auto dur     = std::chrono::duration_cast<std::chrono::milliseconds>(end - _start);
    auto elapsed = dur.count() / 1000.0;
    std::cout << _label << ": " << elapsed << " seconds" << std::endl;
  };
};

arma::fmat random_matrix(int size) {
  arma::fmat mat(size, size);
  mat.randu();
  return mat;
}

arma::fmat random_invertible_matrix(int size) {
  arma::fmat mat(size, size);
  mat.randu();
  mat += arma::eye<arma::fmat>(3, 3);
  return mat;
}


int main(int argc, char* argv[]) {

  std::cout << "ARMADILLO VERSION: "
            << ARMA_VERSION_MAJOR << "."
            << ARMA_VERSION_MINOR << "."
            << ARMA_VERSION_PATCH << std::endl;

  int N = 1000000;

  std::vector<arma::fmat> in1(N);
  std::vector<arma::fmat> in2(N);
  std::vector<arma::fmat> out(N);

  {
    Timer t("Matrix creation");
    for (auto& m : in1) m = random_invertible_matrix(3);
    for (auto& m : in2) m = random_matrix(3);
  }

  {
    Timer t("Using i()");
    for (int i = 0; i < in1.size(); i++) {

      out[i] = in1[i].i() * in2[i];
    }
  }
  {
    Timer t("Using arma::inv");
    for (int i = 0; i < in1.size(); i++) {

      out[i] = arma::inv(in1[i]) * in2[i];
    }
  }
  {
    Timer t("Using temporary variable");
    for (int i = 0; i < in1.size(); i++) {

      arma::fmat tmp = in1[i].i();
      out[i] = tmp * in2[i];
    }
  }
}
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information