Skip to content

Optimize psign

Reference issue

What does this implement/fix?

Optimizes the generic sign function for floating point types.

Previously (float):

  • 3 constants
  • 3 comparisons
  • 3 logicals
  • 1 blend

Now:

  • 2 constants
  • 1 comparison
  • 2 logicals (+1 absolute value) = ~3 logicals
  • 1 blend

AVX2 disassembly for float

Eigen::internal::test_old(float __vector(8)):
        vxorps  xmm1, xmm1, xmm1
        vcmpps  ymm3, ymm0, ymm0, 0
        vbroadcastss    ymm4, DWORD PTR .LC1[rip]
        vcmpps  ymm2, ymm1, ymm0, 17
        vcmpps  ymm1, ymm0, ymm1, 17
        vandps  ymm2, ymm2, ymm4
        vbroadcastss    ymm4, DWORD PTR .LC3[rip]
        vandps  ymm1, ymm1, ymm4
        vorps   ymm1, ymm2, ymm1
        vblendvps       ymm0, ymm0, ymm1, ymm3
        ret
Eigen::internal::test_new(float __vector(8)):
        vbroadcastss    ymm1, DWORD PTR .LC5[rip]
        vbroadcastss    ymm3, DWORD PTR .LC1[rip]
        vandps  ymm2, ymm0, ymm1
        vandnps ymm1, ymm1, ymm0
        vxorps  xmm0, xmm0, xmm0
        vcmpps  ymm0, ymm0, ymm2, 17
        vorps   ymm1, ymm1, ymm3
        vblendvps       ymm0, ymm2, ymm1, ymm0
        ret

Additional information

Edited by Charles Schlosser

Merge request reports

Loading