Assigned to **Nobody**

**Link to original bugzilla bug (#1780)**

**Version**: 3.4 (development)

**Operating system**: Linux

Created attachment 963

A script that demonstrates compiler flags that begin to fail for the half_float test

This unit test fails with compiler support of any of the following features

which are part of the "-march=ivybridge" architecture settings.

`-target-feature +fsgsbase -target-feature +rdrnd -target-feature +f16c`

VERIFY_IS_EQUAL( std::numeric_limits<half>::signaling_NaN().x, half(std::numeric_limits<float>::signaling_NaN()).x );

The failure is documented for both Mac/clang and Linux/gcc builds.

Using the "-mtune=native -march=native" compiler flags on recently (i.e. ~5 years ago) released CPU's demonstrates this issue.

**Attachment 963**, "A script that demonstrates compiler flags that begin to fail for the half_float test":

build_half_float.sh

Assigned to **Nobody**

**Link to original bugzilla bug (#1779)**

After using Eigen's quaternions to implement a vector rotation, and comparing the generated code with Rodrigue's formula (https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula), I've noticed that the quaternion version was significantly more complex.

This brought me to the following article, which describes a more efficient implementation for quaternion * vector multiplication: https://blog.molecular-matters.com/2013/05/24/a-faster-quaternion-vector-multiplication/. A full derivation is found here: https://gamesandsimulations.fandom.com/wiki/Quaternions.

I suggest that Eigen considers switching to this more efficient implementation.

Here is the summary:

The canonical way of multiplying a quaternion q by a vector v is given by the following formula:

v' = q * v * conjugate(q)

where the vector v is being treated as a quaternion with w=0, so the above essentially boils down to two quaternion multiplications, which are a bit expensive.

Turns out there is a faster way, which is the following:

t = 2 * cross(q.xyz, v)

v' = v + q.w * t + cross(q.xyz, t)

The faster method comes courtesy of Fabian Giesen (ryg of Farbrausch fame), who posted this to the MollyRocket forums years ago. [...]

In my SSE2 code path, the new method is about 35% faster than the original. Enjoy, and don’t forget to share this gem with other people!

Assigned to **Nobody**

**Link to original bugzilla bug (#1779)**

After using Eigen's quaternions to implement a vector rotation, and comparing the generated code with Rodrigue's formula (https://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula), I've noticed that the quaternion version was significantly more complex.

This brought me to the following article, which describes a more efficient implementation for quaternion * vector multiplication: https://blog.molecular-matters.com/2013/05/24/a-faster-quaternion-vector-multiplication/. A full derivation is found here: https://gamesandsimulations.fandom.com/wiki/Quaternions.

I suggest that Eigen considers switching to this more efficient implementation.

Here is the summary:

The canonical way of multiplying a quaternion q by a vector v is given by the following formula:

v' = q * v * conjugate(q)

where the vector v is being treated as a quaternion with w=0, so the above essentially boils down to two quaternion multiplications, which are a bit expensive.

Turns out there is a faster way, which is the following:

t = 2 * cross(q.xyz, v)

v' = v + q.w * t + cross(q.xyz, t)

The faster method comes courtesy of Fabian Giesen (ryg of Farbrausch fame), who posted this to the MollyRocket forums years ago. [...]

In my SSE2 code path, the new method is about 35% faster than the original. Enjoy, and don’t forget to share this gem with other people!

Assigned to **Nobody**

**Link to original bugzilla bug (#1778)**

**Version**: 3.4 (development)

This is a frequently asked feature. Thanks to IndexedView, we can easily implement it as:

template<typename XprType, typename RowFactorType, typename ColFactorType>

auto repelem(const XprType &xpr, RowFactorType row_factor, ColFactorType col_factor) {

using namespace Eigen;

```
const int RowFactor = internal::get_fixed_value<RowFactorType>::value;
const int ColFactor = internal::get_fixed_value<ColFactorType>::value;
const int NRows = XprType::RowsAtCompileTime == Dynamic || RowFactor == Dynamic ? Dynamic : XprType::RowsAtCompileTime*RowFactor;
const int NCols = XprType::ColsAtCompileTime == Dynamic || ColFactor == Dynamic ? Dynamic : XprType::ColsAtCompileTime*ColFactor;
const int nrows = internal::get_runtime_value(row_factor) * xpr.rows();
const int ncols = internal::get_runtime_value(col_factor) * xpr.cols();
return xpr(
Array<int,NRows,1>::LinSpaced(nrows,0,xpr.rows()-1),
Array<int,NCols,1>::LinSpaced(ncols,0,xpr.cols()-1)
);
```

}

Full demo: https://godbolt.org/z/rYgDxF

This can easily be turned as a member function and specialized for Horizontal or Vertical replication only.

What would be a good name given that we already have "replicate" as an equivalent to "repmat" ?

Can we do better than the above implementation if using a dedicated expression (as for replicate)?

Assigned to **Nobody**

**Link to original bugzilla bug (#1777)**

**Version**: 3.4 (development)

Currently many functions return slightly different results for coefficients evaluated using a SIMD path or a scalar path. This includes exp, log, logistic_function, etc.

Since the SIMD implementation of those functions can also be called on scalar inputs, we could easily solve this inconsistency by plugging the respective functor call to the generic SIMD path.

Shall we do that unconditionally or only if vectorization is enabled?

Assigned to **Nobody**

**Link to original bugzilla bug (#1776)**

**Version**: 3.4 (development)

Given the following:

```
auto x = Eigen::ArrayXXd(2, 3);
x << 0.0, 1.0, 2.0,
3.0, 4.0, 5.0;
for (auto it = x.colwise().cbegin(); it != x.colwise().cend(); ++it)
std::cout << "x = " << it->coeff(0) << '\n';
```

GCC 9.2.1 reports:

`.../src/Core/StlIterators.h:254:48: error: taking address of rvalue [-fpermissive] `

No such warning is triggered if we use operator* instead:

`std::cout << "x = " << (*it).coeff(0) << '\n'; `

as would be done implicitly if we were to use range-based for.