Improve Index* / Compare* tail handling.
After the vectorized loop in whole VECLEN
s, the tail can be handled by analyzing one more vector [END−VECLEN; END)
that partially overlaps with previous data, instead of jumping to the <VECLEN
case. For example, 40 bytes can be handled with XMM
s as 0–15 + 16–31 + 24–39, without ever resorting to other branches.