Don’t use SSE2 instructions in Move_8OrMore_SSE.
In !555 (merged), I originally used movdqu/movntdq
but then changed to movups/movntps
because in absence of arithmetics these instructions must be completely equivalent and movups/movntps
are shorter. I thought that as an additional point, it allows the function to get by with just SSE1 (so I checked has_sse_support
instead of has_sse2_support
), but there are some SSE2 instructions left over with possible SSE1 replacements. ^^
Edited by Rika