Eigen Slow in Docker Container

When running the same test on my machine (Ubuntu 16.04), vs in a Docker container (Base image is Ubuntu 16.04), the test runs much slower (7s vs 241s), after profiling with gprof, I found out almost all of the time when run within the container is spent in the following function, using Eigen 3.3.7:

 void gebp_kernel<LhsScalar,RhsScalar,Index,DataMapper,mr,nr,ConjugateLhs,ConjugateRhs>
  ::operator()(const DataMapper& res, const LhsScalar* blockA, const RhsScalar* blockB,
               Index rows, Index depth, Index cols, ResScalar alpha,
               Index strideA, Index strideB, Index offsetA, Index offsetB)

which is called by:

Eigen::internal::triangular_solve_matrix<double, long, 1, 2, false, 0, 0>::run(long, long, double const*, long, double*, long, Eigen::internal::level3_blocking<double, double>&)

Any ideas why this slow down might occur? The only thing I have tried so far is setting the default L1,L2,L3 cache sizes in GeneralBlockPanelKernel.h after reading this: https://gitlab.com/arm-hpc/packages/-/wikis/packages/tensorflow#setting-cache-sizes-for-eigen-gebp-kernel. However, the test runs just as slow regardless of setting that. The container does not have any CPU limitations, and is allowed to use all the CPUs and GPUs.

Edited by anodyne-canvas