Use aligned loads/stores whenever possible to speedup the GEBP kernel

Submitted by Benoit Steiner

Assigned to Nobody

Link to original bugzilla bug (#724)
Version: 3.2

Description

Created attachment 413
Patch against the latest version of the code

Using unaligned sse loads/stores is slower than using the corresponding aligned instructions, even if the underlying address is aligned.

The attached patch attempts to use aligned loads/stores as much as possible in the gebp_kernel. This results in a performance gain of a few percent on the Eigen matrix-matrix benchmark as depicted by the attached before/after pictures (run on Sandy Bridge with gcc 4.6).

Patch 413, "Patch against the latest version of the code":
aligned_mem.patch

Edited Dec 05, 2019 by Eigen Bugzilla