Trivial adjustments to !379.
- Check
.L16x_Tail:
cmp %r9, %r10 ; r10 = end of buf1, r9 = end of full XMMs in buf1
je .LNothing
; ...handle 1 to 15 tail bytes
determines if there are tail bytes, but is required only after 16× loop; jump directly to .L16x_Tail
is performed for len
∈ [4; 15]. But instead of just moving (or in addition to moving) .L16x_Tail
label two instructions down, I moved the entire 16× loop out, tuning the <16b case further.
movzbl (%rcx,%rax), %ecx
movzbl (%rdx,%rax), %edx
mov %rcx, %rax
sub %rdx, %rax
ret
can be carefully modified to get rid of mov
.
Edited by Rika