Improve pblend AVX implementation
blendv only cares about top bit of a mask, so we can use ints. Removes vcvtdq2ps instruction and makes pblend faster:
BM_blend 1.31ns ± 1% 0.98ns ±15% -24.84% (p=0.008 n=5+5)
blendv only cares about top bit of a mask, so we can use ints. Removes vcvtdq2ps instruction and makes pblend faster:
BM_blend 1.31ns ± 1% 0.98ns ±15% -24.84% (p=0.008 n=5+5)