Unify Geometry_SSE.h into a generic version and make it support both for x86 and ARM
The quat_product and quat_conj ops in Geometry_SSE.h just supported for x86,i am working on replacing it by a generic version and making it supported both for x86 and arm.We also benchmarked this and confirmed that the generic version does not bring any regression on x86, but get a great improvement on ARM.Code for testing and benchmarks belows:
#include<benchmark/benchmark.h>
#include<Eigen/Geometry>
using namespace Eigen;
template <typename T> inline void DoNotOptimizeAway(T&& datum) {
asm volatile ("" : "+m,r" (datum): :"memory");
}
static void quat_product_f(benchmark::State& state){
Quaternion<float,AutoAlign> q1,q2,q3;
q1.coeffs().setRandom();
q2.coeffs().setRandom();
for(auto _ : state){
DoNotOptimizeAway(q3 = q2 * q1);
}
}
static void quat_conj_f(benchmark::State& state){
Quaternion<float,AutoAlign> q1,q2;
q1.coeffs().setRandom();
q2.coeffs().setRandom();
for(auto _ : state){
DoNotOptimizeAway(q2 = q1.conjugate());
}
}
static void quat_product_d(benchmark::State& state){
Quaternion<double,AutoAlign> q1,q2,q3;
q1.coeffs().setRandom();
q2.coeffs().setRandom();
for(auto _ : state){
DoNotOptimizeAway(q3 = q2 * q1);
}
}
static void quat_conj_d(benchmark::State& state){
Quaternion<double,AutoAlign> q1,q2;
q1.coeffs().setRandom();
q2.coeffs().setRandom();
for(auto _ : state){
DoNotOptimizeAway(q2 = q1.conjugate());
}
}
BENCHMARK(quat_product_f);
BENCHMARK(quat_conj_f);
BENCHMARK(quat_product_d);
BENCHMARK(quat_conj_d);
BENCHMARK_MAIN();
We used GCC to compile and turn on the -O3 optimization option.
Benchmarks belows:
compare the generic version to SSE version on x86
----------------------------------------------
Benchmark OldTime NewTime
----------------------------------------------
quat_product_f 2.21 ns 2.21 ns
quat_conj_f 0.418 ns 0.428 ns
quat_product_d 2.84 ns 2.84 ns
quat_conj_d 0.631 ns 0.632 ns
compare the generic version to STD version(Not Vectorized) on ARM
-----------------------------------------------
Benchmark OldTime NewTime
-----------------------------------------------
quat_product_f 9.30 ns 5.08 ns
quat_conj_f 8.49 ns 1.13 ns
quat_product_d 9.60 ns 9.62 ns
quat_conj_d 8.49 ns 2.05 ns
Edited by Guoqiang QI