Skip to content

Trivial adjustments to !379.

comparebyte.patch

  1. Check
.L16x_Tail:
    cmp      %r9, %r10 ; r10 = end of buf1, r9 = end of full XMMs in buf1
    je       .LNothing
; ...handle 1 to 15 tail bytes

determines if there are tail bytes, but is required only after 16× loop; jump directly to .L16x_Tail is performed for len ∈ [4; 15]. But instead of just moving (or in addition to moving) .L16x_Tail label two instructions down, I moved the entire 16× loop out, tuning the <16b case further.

    movzbl   (%rcx,%rax), %ecx
    movzbl   (%rdx,%rax), %edx
    mov      %rcx, %rax
    sub      %rdx, %rax
    ret

can be carefully modified to get rid of mov.

Edited by Rika
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information