Skip to content

Improve pblend AVX implementation

blendv only cares about top bit of a mask, so we can use ints. Removes vcvtdq2ps instruction and makes pblend faster:

BM_blend 1.31ns ± 1% 0.98ns ±15% -24.84% (p=0.008 n=5+5)

Merge request reports

Loading