Skip to content

Unroll F32 to BF16 loop - 1.8X faster conversions for LLVM. Use vector pairs for GCC.

Unroll F32 to BF16 loop - 1.8X faster conversions for LLVM. Use vector pairs for GCC. Other minor improvements.

Merge request reports

Loading