Skip to content

Add support for casting between double and int64_t for SSE and AVX2.

Improves casting double->int64. Measurement of tensor cast expression B = A.template cast<OUT>():

SSE4.2:
name                           old cpu/op   new cpu/op   delta
BM_cast<double,int64_t>/8      6.95ns ± 4%  4.09ns ± 0%  -41.20%  (p=0.000 n=59+46)
BM_cast<double,int64_t>/64     29.4ns ± 2%  26.1ns ± 1%  -11.39%  (p=0.000 n=44+51)
BM_cast<double,int64_t>/512     212ns ± 0%   209ns ± 0%   -1.47%  (p=0.000 n=51+55)
BM_cast<double,int64_t>/4k     1.79µs ± 3%  1.80µs ± 1%     ~     (p=0.748 n=60+54)
BM_cast<double,int64_t>/32k    14.2µs ± 2%  14.3µs ± 1%   +0.95%  (p=0.000 n=58+54)
BM_cast<double,int64_t>/256k    171µs ± 3%   171µs ± 3%     ~     (p=0.767 n=60+60)
BM_cast<double,int64_t>/1M      731µs ±12%   742µs ±16%     ~     (p=0.275 n=50+49)
BM_cast<int64_t,double>/8      5.17ns ± 1%  5.18ns ± 1%     ~     (p=0.072 n=52+54)
BM_cast<int64_t,double>/64     19.9ns ± 1%  19.9ns ± 2%     ~     (p=0.362 n=41+57)
BM_cast<int64_t,double>/512     119ns ± 0%   119ns ± 0%     ~     (p=0.771 n=54+55)
BM_cast<int64_t,double>/4k     1.35µs ± 0%  1.35µs ± 1%   -0.12%  (p=0.002 n=44+52)
BM_cast<int64_t,double>/32k    10.8µs ± 1%  10.7µs ± 1%   -0.19%  (p=0.016 n=51+51)
BM_cast<int64_t,double>/256k    158µs ± 3%   157µs ± 2%   -0.33%  (p=0.019 n=60+60)
BM_cast<int64_t,double>/1M      684µs ±16%   690µs ±20%     ~     (p=0.913 n=53+54)
Edited by Rasmus Munk Larsen

Merge request reports

Loading