ploaddup using _mm_load_sd, which is generally miscompiled on gcc/i386

Submitted by Benoit Jacob

Assigned to Gael Guennebaud @ggael

Link to original bugzilla bug (#200)

Description

As we've found out on bug #195 (closed), GCC (at least up to 4.4) on i386 (i.e. -m32) miscompiles the _mm_load_sd intrinsic in that it adds redundant x87 fldl/fstpl instructions, which should result in poor performance (in bug #195 (closed), it even resulted in a wrong result bug, but that's a different story).

Our ploaddup function is still using _mm_load_sd, so it would be nice to have a work-around for gcc/i386 not using it.

Edited by Rasmus Munk Larsen