Optimize any()/all()/count() reductions

Submitted by Christoph Hertzberg @chhtz

Assigned to Nobody

Link to original bugzilla bug (#585)
Version: 3.3 (current stable)

Description

Assuming these reductions are applied on the result of SSE-comparisons, it's most likely faster to bit_and/bit_or some consecutive results then _mm_movemask_pX the result to an integer and compare that against 0x0, 0x3 or 0xF. This should reduce latency and the number of branches.

I'm not sure, if this is related to bug #65 (closed).

Depends on

#97 (closed) #272 (closed)

Blocking

#1608

Edited Jul 02, 2023 by Charles Schlosser