Skip to content

Why Select operation not implemented with vectorized `pblend`?

Describe the feature you would like to be implemented.

It has been listed in TODO of evaluator<Select...> for a long time, but still today, the implementation of Select is still scaler version of ?: without a vectorized packet version. Why? It looks not very hard implement, as it can be easily implemented with cast to bool_pocket then call the blend intrincs.

Would such a feature be useful for other users? Why?

Obviously, it will significantly boosts the performance involving select ops when compiler cannot do the auto-vec job.

Any hints on how to implement the requested feature?

Introduce a tenary_evaluator/functor.

Additional resources

Also, the plend for AVX-512 instructions is still not-implemented, Why? It is just _mm_cmpeq_epi8_mask and _mm512_mask_blend_*.