Why Select operation not implemented with vectorized `pblend`?
Describe the feature you would like to be implemented.
It has been listed in TODO of evaluator<Select...>
for a long time,
but still today, the implementation of Select
is still scaler version of ?:
without a vectorized packet version.
Why? It looks not very hard implement, as it can be easily implemented with
cast to bool_pocket
then call the blend intrincs
.
Would such a feature be useful for other users? Why?
Obviously, it will significantly boosts the performance involving select
ops when compiler cannot do the auto-vec job.
Any hints on how to implement the requested feature?
Introduce a tenary_evaluator/functor.
Additional resources
Also, the plend
for AVX-512 instructions is still not-implemented, Why?
It is just _mm_cmpeq_epi8_mask
and _mm512_mask_blend_*
.