Use aligned loads/stores whenever possible to speedup the GEBP kernel
Submitted by Benoit Steiner
Assigned to Nobody
Link to original bugzilla bug (#724)
Created attachment 413
Patch against the latest version of the code
Using unaligned sse loads/stores is slower than using the corresponding aligned instructions, even if the underlying address is aligned.
The attached patch attempts to use aligned loads/stores as much as possible in the gebp_kernel. This results in a performance gain of a few percent on the Eigen matrix-matrix benchmark as depicted by the attached before/after pictures (run on Sandy Bridge with gcc 4.6).
Patch 413, "Patch against the latest version of the code":