Trivial adjustments to !379.
[comparebyte.patch](/uploads/d99fe5cbb17612f72748edfe125b473d/comparebyte.patch)
1. Check
```nasm
.L16x_Tail:
cmp %r9, %r10 ; r10 = end of buf1, r9 = end of full XMMs in buf1
je .LNothing
; ...handle 1 to 15 tail bytes
```
determines if there are tail bytes, but is required only after 16× loop; jump directly to `.L16x_Tail` is performed for `len` ∈ [4; 15]. But instead of just moving (or in addition to moving) `.L16x_Tail` label two instructions down, I moved the entire 16× loop out, tuning the <16b case further.
2.
```nasm
movzbl (%rcx,%rax), %ecx
movzbl (%rdx,%rax), %edx
mov %rcx, %rax
sub %rdx, %rax
ret
```
can be carefully modified to get rid of `mov`.
issue