Use optimal Fill* breakpoints instead of conservative ones.
Fill*
used strange-looking and usually suboptimal (but carefully chosen for the worst cases) breakpoints: size
≤ 49 for writing 1+1 aligned vectors in the middle, size
≤ 81 for writing 2+2, and size
≤ 113 for writing 2+4. Now it uses the actual size of the aligned part, which depending on the situation at runtime can raise them up to 64, 96, and 128, respectively.
(Hardly affects practical performance, though...)