Optimize psign
Reference issue
What does this implement/fix?
Optimizes the generic sign function for floating point types.
Previously (float):
- 3 constants
- 3 comparisons
- 3 logicals
- 1 blend
Now:
- 2 constants
- 1 comparison
- 2 logicals (+1 absolute value) = ~3 logicals
- 1 blend
AVX2 disassembly for float
Eigen::internal::test_old(float __vector(8)):
vxorps xmm1, xmm1, xmm1
vcmpps ymm3, ymm0, ymm0, 0
vbroadcastss ymm4, DWORD PTR .LC1[rip]
vcmpps ymm2, ymm1, ymm0, 17
vcmpps ymm1, ymm0, ymm1, 17
vandps ymm2, ymm2, ymm4
vbroadcastss ymm4, DWORD PTR .LC3[rip]
vandps ymm1, ymm1, ymm4
vorps ymm1, ymm2, ymm1
vblendvps ymm0, ymm0, ymm1, ymm3
ret
Eigen::internal::test_new(float __vector(8)):
vbroadcastss ymm1, DWORD PTR .LC5[rip]
vbroadcastss ymm3, DWORD PTR .LC1[rip]
vandps ymm2, ymm0, ymm1
vandnps ymm1, ymm1, ymm0
vxorps xmm0, xmm0, xmm0
vcmpps ymm0, ymm0, ymm2, 17
vorps ymm1, ymm1, ymm3
vblendvps ymm0, ymm2, ymm1, ymm0
ret
Additional information
Edited by Charles Schlosser