Bug in internal::psqrt SSE implementation
Submitted by Jeff Dean
Assigned to Nobody
Link to original bugzilla bug (#613)
Operating system: Linux
Description
Created attachment 346
Patch to fix the problem. However, it has a lengthy comment that you probably can just remove from the real distribution.
The internal::psqrt routine in Eigen/src/Core/arch/SSE/MathFunctions.h appears to have a bug whereby values in the range [1.17549e-038, 1.19209e-007] return a square root value of exactly 0.0, rather than a true square root. This is because the code that computes the non_zero_mask on the Packet4f entry by comparing to see which values are greater than std::numeric_limits<float>::epsilon() (i.e. 1.19209e-007), rather than std::numeric_limits<float>::min() (i.e. 1.17549e-038).
The fix is to change this line:
Packet4f non_zero_mask =
_mm_cmpgt_ps(_x, pset1<Packet4f>(std::numeric_limits<float>::epsilon()));
To:
Packet4f non_zero_mask =
_mm_cmpgt_ps(_x, pset1<Packet4f>(std::numeric_limits<float>::min()));
A test program that exhaustively shows where values in this range compute sqrt results that differ by more than 0.00001 from the default sqrt(float) implementation is below.
It might be worth incorporating the style of this test program into the library to test all the other specialized Packet implementations in MathFunctions.h in a similar manner (i.e. make sure that the Packet SSE implementations compute answers that are very similar to the default single-value implementations for a large number of randomly selected floating point values).
Bug fix for Eigen3 psqrt() operation, where values that were
between 0 and std::numeric_limits<float>::epsilon() (1.19209e-007)
would be given a value of 0, rather than something closer to
what the sqrt(x) function in math.h would return.
Wrote simple test program (shown below) to illustate the problem:
% g++ -o sqrt_issue -lm -I/google/src/cloud/jeff/eigen-sqrt/google3/third_party/eigen3/ sqrt_issue.cc
Without this fix, the program generates the following output:
% ./sqrt_issue
Sampled 873254095 values out of 873254095
...
Mismatch sample: sqrt() 0 vs. SQRT 0.000345043 for 1.19055e-07
Mismatch sample: sqrt() 0 vs. SQRT 0.000345146 for 1.19126e-07
Mismatch sample: sqrt() 0 vs. SQRT 0.000345249 for 1.19197e-07
Mismatches: 86251777 of 873254095
With the fix, the program shows 0 mismatches:
% ./sqrt_issue
Sampled 873254095 values out of 873254095
Mismatches: 0 of 873254095
Where sqrt_issue.cc is:
------------------------
/* Program to illustrate an issue with the Eigen psqrt implementation
for SSE.
Author: Jeff Dean (firstname@google.com)
*/
#include <stdio.h>
#include <math.h>
#include <vector>
#include "Eigen/Core"
typedef unsigned int uint32;
namespace {
float SQRT(float x) {
return sqrt(x);
}
}
int main(int argc, char** argv) {
std::vector<float> v;
float max_value = 1.1 * std::numeric_limits<float>::epsilon();
int count = 0;
for (uint32 i = 0; i < (1ull<<32)-1; i++) {
float f = *(reinterpret_cast<float*>(&i));
if (f >= 0.0 && f <= max_value) {
count++;
// Randomly sample problematic values
v.push_back(f);
}
}
const int N = v.size();
fprintf(stderr, "Sampled %d values out of %d\n", N, count);
std::vector<float> v2(N), v3(N);
Eigen::Map<Eigen::ArrayXf> r1(&v2[0], N);
Eigen::Map<Eigen::ArrayXf> r2(&v3[0], N);
Eigen::Map<Eigen::ArrayXf> in(&v[0], N);
r1 = in.sqrt();
r2 = in.unaryExpr(std::ptr_fun(SQRT));
int mismatches = 0;
for (int i = 0; i < N; i++) {
if (fabs(r1[i] - r2[i]) > 0.00001) {
mismatches++;
if ((mismatches % 10000) == 0) {
fprintf(stdout,
"Mismatch sample: sqrt() %g vs. SQRT %g for %g\n",
r1[i], r2[i], in[i]);
}
}
}
fprintf(stderr, "Mismatches: %d of %d\n", mismatches, N);
}
Patch 346, "Patch to fix the problem. However, it has a lengthy comment that you probably can just remove from the real distribution.":
eigen-sqrt.patch