Add support for casting between double and int64_t for SSE and AVX2.
Improves casting double->int64. Measurement of tensor cast expression B = A.template cast<OUT>():
SSE4.2:
name old cpu/op new cpu/op delta
BM_cast<double,int64_t>/8 6.95ns ± 4% 4.09ns ± 0% -41.20% (p=0.000 n=59+46)
BM_cast<double,int64_t>/64 29.4ns ± 2% 26.1ns ± 1% -11.39% (p=0.000 n=44+51)
BM_cast<double,int64_t>/512 212ns ± 0% 209ns ± 0% -1.47% (p=0.000 n=51+55)
BM_cast<double,int64_t>/4k 1.79µs ± 3% 1.80µs ± 1% ~ (p=0.748 n=60+54)
BM_cast<double,int64_t>/32k 14.2µs ± 2% 14.3µs ± 1% +0.95% (p=0.000 n=58+54)
BM_cast<double,int64_t>/256k 171µs ± 3% 171µs ± 3% ~ (p=0.767 n=60+60)
BM_cast<double,int64_t>/1M 731µs ±12% 742µs ±16% ~ (p=0.275 n=50+49)
BM_cast<int64_t,double>/8 5.17ns ± 1% 5.18ns ± 1% ~ (p=0.072 n=52+54)
BM_cast<int64_t,double>/64 19.9ns ± 1% 19.9ns ± 2% ~ (p=0.362 n=41+57)
BM_cast<int64_t,double>/512 119ns ± 0% 119ns ± 0% ~ (p=0.771 n=54+55)
BM_cast<int64_t,double>/4k 1.35µs ± 0% 1.35µs ± 1% -0.12% (p=0.002 n=44+52)
BM_cast<int64_t,double>/32k 10.8µs ± 1% 10.7µs ± 1% -0.19% (p=0.016 n=51+51)
BM_cast<int64_t,double>/256k 158µs ± 3% 157µs ± 2% -0.33% (p=0.019 n=60+60)
BM_cast<int64_t,double>/1M 684µs ±16% 690µs ±20% ~ (p=0.913 n=53+54)
Edited by Rasmus Munk Larsen