Sign in or sign up before continuing. Don't have an account yet? Register now to get started.
Register now

Trivial adjustments to !379.

comparebyte.patch

  1. Check
.L16x_Tail:
    cmp      %r9, %r10 ; r10 = end of buf1, r9 = end of full XMMs in buf1
    je       .LNothing
; ...handle 1 to 15 tail bytes

determines if there are tail bytes, but is required only after 16× loop; jump directly to .L16x_Tail is performed for len ∈ [4; 15]. But instead of just moving (or in addition to moving) .L16x_Tail label two instructions down, I moved the entire 16× loop out, tuning the <16b case further.

    movzbl   (%rcx,%rax), %ecx
    movzbl   (%rdx,%rax), %edx
    mov      %rcx, %rax
    sub      %rdx, %rax
    ret

can be carefully modified to get rid of mov.

Edited Feb 23, 2023 by Rika
Assignee Loading
Time tracking Loading