AVX2 - double->int64_t casting
Reference issue
What does this implement/fix?
Fully vectorizes the cast double
-> int64_t
when AVX2 is available. Unfortunately, this approach cannot be used for SSE, as vector shift intrinsics (where the shift count can be different for each element), e.g. _mm256_srlv_epi64
, are not available until AVX2.
For a pure casting operation, this appears to improve throughput by 70%.
Also took the liberty to clean up some AVX2 code, applying lessons learned.