thread local storage for the Eigen thread count
Submitted by fabien chêne
Assigned to Nobody
Link to original bugzilla bug (#1169)
Version: 3.3 (current stable)
Description
Created attachment 654
testcase
The current way to set the number of threads in Eigen is not flexible enough, consider the following example, on a machine with 8 cores:
void f()
{
Eigen::setNbThreads( 1 );
#pragma omp parallel for num_threads( 4 )
for( int i = 0; i < 100; ++i )
{
#pragma omp parallel for num_threads( 2 )
for( int j = 0; j < 20; j++ )
{
// do some matrix computation
}
// sequential part:
Eigen::setNbThreads( 2 );
// do some matrix computation using eigen's parallelism.
Eigen::setNbThreads( 1 );
#pragma omp paralell for num_threads( 2 )
for( int j = 0; j < 20; j++ )
{
// do some matrix computation
}
}
}
The problem is that the thread count acts as a global variable (with static storage duration), and therefore it is not guarantee that the sequential part be performed with 2 threads in Eigen, and the parallel parts be performed with 1 thread in Eigen.
Would it be possible to provide a way to achieve that goal ?
The obvious answer would be to make the thread count thread local. I guess that it is not possible for compatibility reasons, and on platforms that perhaps do not have TLS support. Hence the idea would be to add a new function to set the thread count per thread:
void Eigen::setNbThreadsInThisThread( int nbThreads );
What do you think ?
testcase attached.
Attachment 654, "testcase":
TestEigen.cc