Skip to content

x86: Optimize svml_s_atanhf16_core_avx512.S

Optimizations are:
    1. Reduce code size (-58 bytes).
    2. Remove redundant move instructions.
    3. Slightly improve instruction selection/scheduling where
       possible.
    4. Reduce rodata size ([-128, -188] bytes).

Result is roughly a 14% speedup:

        Function,   New Time, Old Time, New / Old
_ZGVeN16v_atanhf,      11.95,   13.879,     0.861

Merge request reports