Skip to content

HEXL v1.1.0 integration

Fabian Boemer requested to merge fboemer/hexl-1.1.0 into master

Initial integration with Intel HEXL https://github.com/intel/hexl

Preferred over !284 (closed)

Closes #297 (closed)

On ICX machine, configured via

(with compiler g++-10)
-- WITH_NTL:         OFF
-- WITH_TCM:         OFF
-- WITH_INTEL_HEXL:  ON
-- WITH_OPENMP:      OFF
-- NATIVE_SIZE:      64
-- CKKS_M_FACTOR:    1
-- WITH_NATIVEOPT:   ON

I'm seeing:

WITH_INTEL_HEXL=OFF
./bin/benchmark/lib-benchmark-hexl
NTTTransform1024               14.3 us         14.2 us        98352
INTTTransform1024              13.7 us         13.7 us       101888
NTTTransform4096               67.3 us         67.2 us        20827
INTTTransform4096              64.1 us         64.0 us        21876
NTTTransformInPlace1024        14.1 us         14.1 us        99464
INTTTransformInPlace1024       13.6 us         13.6 us       102788
NTTTransformInPlace4096        67.1 us         67.0 us        20884
INTTTransformInPlace4096       63.3 us         63.2 us        22150
BFVrns_KeyGen                  1132 us         1131 us         1236
BFVrns_MultKeyGen              1820 us         1818 us          768
BFVrns_EvalAtIndexKeyGen       1860 us         1859 us          758
BFVrns_Encryption              1286 us         1284 us         1089
BFVrns_Decryption               305 us          304 us         4597
BFVrns_Add                     21.0 us         21.0 us        66308
BFVrns_AddInPlace              17.8 us         17.8 us        78919
BFVrns_MultNoRelin             4388 us         4384 us          319
BFVrns_MultRelin               5101 us         5097 us          275
BFVrns_EvalAtIndex              555 us          554 us         2525
CKKS_KeyGen                    2362 us         2361 us          592
CKKS_MultKeyGen                6009 us         6006 us          232
CKKS_EvalAtIndexKeyGen         6084 us         6080 us          230
CKKS_Encryption                2380 us         2362 us          592
CKKS_Decryption                 788 us          783 us         1795
CKKS_Add                       43.6 us         43.3 us        32455
CKKS_AddInPlace                35.4 us         35.2 us        38980
CKKS_MultNoRelin                251 us          250 us         5596
CKKS_MultRelin                 4079 us         4061 us          345
CKKS_Relin                     4267 us         4251 us          330
CKKS_Rescale                    787 us          784 us         1922
CKKS_RescaleInPlace             722 us          720 us         1939
CKKS_EvalAtIndex               3575 us         3565 us          377
BGVrns_KeyGen                  2364 us         2358 us          592
BGVrns_MultKeyGen              6071 us         6058 us          231
BGVrns_EvalAtIndexKeyGen       6203 us         6190 us          226
BGVrns_Encryption              2765 us         2760 us          507
BGVrns_Decryption               393 us          393 us         3595
BGVrns_Add                     50.1 us         50.0 us        27946
BGVrns_AddInPlace              44.7 us         44.7 us        31614
BGVrns_MultNoRelin              243 us          243 us         5719
BGVrns_MultRelin               4168 us         4163 us          337
BGVrns_Relin                   4339 us         4334 us          323
BGVrns_ModSwitch                738 us          737 us         1902
BGVrns_ModSwitchInPlace         732 us          731 us         1915
BGVrns_EvalAtIndex             3617 us         3614 us          388
WITH_INTEL_HEXL=ON
./bin/benchmark/lib-benchmark-hexl
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
NTTTransform1024               1.24 us         1.24 us      1099334
INTTTransform1024              1.35 us         1.35 us      1037487
NTTTransform4096               6.57 us         6.57 us       213796
INTTTransform4096              7.05 us         7.04 us       198808
NTTTransformInPlace1024        1.18 us         1.18 us      1185657
INTTTransformInPlace1024       1.25 us         1.25 us      1121451
NTTTransformInPlace4096        5.81 us         5.80 us       241158
INTTTransformInPlace4096       6.22 us         6.22 us       225011
BFVrns_KeyGen                   759 us          758 us         1848
BFVrns_MultKeyGen              1303 us         1293 us         1079
BFVrns_EvalAtIndexKeyGen       1358 us         1349 us         1039
BFVrns_Encryption               816 us          811 us         1725
BFVrns_Decryption               118 us          117 us        10949
BFVrns_Add                     21.0 us         20.9 us        66312
BFVrns_AddInPlace              19.2 us         19.1 us        73093
BFVrns_MultNoRelin             2339 us         2329 us          603
BFVrns_MultRelin               2465 us         2455 us          574
BFVrns_EvalAtIndex              223 us          223 us         6301
CKKS_KeyGen                    1755 us         1749 us          800
CKKS_MultKeyGen                4223 us         4211 us          336
CKKS_EvalAtIndexKeyGen         4157 us         4146 us          327
CKKS_Encryption                1628 us         1624 us          830
CKKS_Decryption                 658 us          657 us         2133
CKKS_Add                       44.1 us         44.0 us        32038
CKKS_AddInPlace                35.2 us         35.2 us        39823
CKKS_MultNoRelin               76.5 us         76.4 us        19422
CKKS_MultRelin                 2341 us         2337 us          599
CKKS_Relin                     3188 us         3184 us          465
CKKS_Rescale                    173 us          173 us         7901
CKKS_RescaleInPlace             170 us          170 us         8572
CKKS_EvalAtIndex               2009 us         2005 us          686
BGVrns_KeyGen                  1655 us         1653 us          850
BGVrns_MultKeyGen              4083 us         4079 us          343
BGVrns_EvalAtIndexKeyGen       4218 us         4214 us          335
BGVrns_Encryption              1625 us         1623 us          859
BGVrns_Decryption               168 us          168 us         8173
BGVrns_Add                     49.9 us         49.9 us        28724
BGVrns_AddInPlace              43.5 us         43.5 us        32781
BGVrns_MultNoRelin             49.5 us         49.4 us        27436
BGVrns_MultRelin               2343 us         2341 us          596
BGVrns_Relin                   3140 us         3138 us          488
BGVrns_ModSwitch                200 us          200 us         7214
BGVrns_ModSwitchInPlace         188 us          188 us         7920
BGVrns_EvalAtIndex             2035 us         2033 us          697

To show that the modulus size changes in lib-benchmark-hexl are required, I also run the benchmark with lib-benchmark. Observe that the runtimes are slower than lib-benchmark-hexl

WITH_INTEL_HEXL=ON
./bin/benchmark/lib-benchmark
NTTTransform1024               2.85 us         2.84 us       384658
INTTTransform1024              3.17 us         3.17 us       441572
NTTTransform4096               14.2 us         14.1 us        99072
INTTTransform4096              15.3 us         15.2 us        91806
NTTTransformInPlace1024        2.85 us         2.85 us       491763
INTTTransformInPlace1024       3.11 us         3.11 us       450279
NTTTransformInPlace4096        13.4 us         13.4 us       103985
INTTTransformInPlace4096       14.5 us         14.5 us        96490
BFVrns_KeyGen                  1885 us         1884 us          739
BFVrns_MultKeyGen              2978 us         2975 us          468
BFVrns_EvalAtIndexKeyGen       3253 us         3251 us          433
BFVrns_Encryption              2001 us         2000 us          699
BFVrns_Decryption               301 us          301 us         4644
BFVrns_Add                     41.5 us         41.5 us        33735
BFVrns_AddInPlace              38.0 us         38.0 us        36843
BFVrns_MultNoRelin             5865 us         5861 us          239
BFVrns_MultRelin               6389 us         6385 us          221
BFVrns_EvalAtIndex              610 us          607 us         2306
CKKS_KeyGen                    1753 us         1740 us          802
CKKS_MultKeyGen                4309 us         4281 us          323
CKKS_EvalAtIndexKeyGen         4202 us         4178 us          337
CKKS_Encryption                1625 us         1617 us          889
CKKS_Decryption                 661 us          658 us         2128
CKKS_Add                       41.8 us         41.6 us        33117
CKKS_AddInPlace                36.8 us         36.6 us        38243
CKKS_MultNoRelin               74.0 us         73.7 us        19129
CKKS_MultRelin                 2474 us         2467 us          561
CKKS_Relin                     2900 us         2892 us          478
CKKS_Rescale                    179 us          178 us         7940
CKKS_RescaleInPlace             170 us          169 us         8329
CKKS_EvalAtIndex               1977 us         1973 us          716
BGVrns_KeyGen                  1598 us         1595 us          878
BGVrns_MultKeyGen              4057 us         4049 us          344
BGVrns_EvalAtIndexKeyGen       4191 us         4184 us          334
BGVrns_Encryption              1607 us         1604 us          862
BGVrns_Decryption               167 us          167 us         8379
BGVrns_Add                     50.9 us         50.8 us        27451
BGVrns_AddInPlace              44.8 us         44.7 us        31218
BGVrns_MultNoRelin             49.7 us         49.6 us        28248
BGVrns_MultRelin               2319 us         2316 us          598
BGVrns_Relin                   2871 us         2868 us          491
BGVrns_ModSwitch                186 us          186 us         7462
BGVrns_ModSwitchInPlace         178 us          178 us         7862
BGVrns_EvalAtIndex             2016 us         2015 us          697

For the polynomial benchmarks, I see:

WITH_INTEL_HEXL=OFF
poly-benchmark-16k
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Native_add               22.3 us         22.3 us        31366
DCRT_add/towers:1        22.6 us         22.6 us        30802
DCRT_add/towers:2        47.4 us         47.3 us        14811
DCRT_add/towers:4         103 us          102 us         6835
DCRT_add/towers:8         205 us          205 us         3418
Native_mul               57.8 us         57.7 us        12128
DCRT_mul/towers:1        57.9 us         57.8 us        12093
DCRT_mul/towers:2         118 us          118 us         5915
DCRT_mul/towers:4         245 us          245 us         2858
DCRT_mul/towers:8         490 us          489 us         1431
Native_ntt                323 us          322 us         2167
DCRT_ntt/towers:1         323 us          322 us         2172
DCRT_ntt/towers:2         646 us          645 us         1083
DCRT_ntt/towers:4        1293 us         1292 us          539
DCRT_ntt/towers:8        2587 us         2585 us          269
Native_intt               299 us          299 us         2336
DCRT_intt/towers:1        299 us          299 us         2342
DCRT_intt/towers:2        599 us          599 us         1168
DCRT_intt/towers:4       1199 us         1198 us          583
DCRT_intt/towers:8       2398 us         2397 us          291
WITH_INTEL_HEXL=ON
poly-hexl-benchmark-16k
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Native_add               21.9 us         21.9 us        31894
DCRT_add/towers:1        22.2 us         22.2 us        31308
DCRT_add/towers:2        46.2 us         46.2 us        15158
DCRT_add/towers:4        99.3 us         99.2 us         7057
DCRT_add/towers:8         199 us          198 us         3528
Native_mul               9.16 us         9.16 us        76442
DCRT_mul/towers:1        9.36 us         9.35 us        74907
DCRT_mul/towers:2        21.0 us         21.0 us        33477
DCRT_mul/towers:4        52.4 us         52.4 us        13350
DCRT_mul/towers:8         117 us          117 us         5977
Native_ntt               42.3 us         42.2 us        16566
DCRT_ntt/towers:1        42.4 us         42.2 us        16567
DCRT_ntt/towers:2        87.1 us         86.4 us         8097
DCRT_ntt/towers:4         183 us          182 us         3850
DCRT_ntt/towers:8         379 us          376 us         1857
Native_intt              41.3 us         41.1 us        15828
DCRT_intt/towers:1       41.3 us         41.1 us        17043
DCRT_intt/towers:2       86.1 us         85.6 us         8175
DCRT_intt/towers:4        185 us          184 us         3793
DCRT_intt/towers:8        385 us          383 us         1826
WITH_INTEL_HEXL=OFF
poly-benchmark-4k
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Native_add               5.22 us         5.21 us       133395
DCRT_add/towers:1        5.23 us         5.22 us       134075
DCRT_add/towers:2        10.6 us         10.6 us        66104
DCRT_add/towers:4        22.8 us         22.8 us        30762
DCRT_add/towers:8        48.6 us         48.5 us        14427
Native_mul               14.1 us         14.1 us        49776
DCRT_mul/towers:1        14.0 us         14.0 us        49937
DCRT_mul/towers:2        28.3 us         28.2 us        24805
DCRT_mul/towers:4        58.0 us         58.0 us        12079
DCRT_mul/towers:8         119 us          119 us         5885
Native_ntt               69.5 us         69.5 us        10071
DCRT_ntt/towers:1        69.5 us         69.5 us        10077
DCRT_ntt/towers:2         139 us          139 us         5036
DCRT_ntt/towers:4         278 us          278 us         2521
DCRT_ntt/towers:8         556 us          555 us         1259
Native_intt              65.7 us         65.7 us        10660
DCRT_intt/towers:1       65.7 us         65.6 us        10668
DCRT_intt/towers:2        131 us          131 us         5332
DCRT_intt/towers:4        263 us          263 us         2665
DCRT_intt/towers:8        526 us          526 us         1331


poly-hexl-benchmark-4k
WITH_INTEL_HEXL=ON
-------------------------------------------------------------
Benchmark                   Time             CPU   Iterations
-------------------------------------------------------------
Native_add               5.22 us         5.19 us       134880
DCRT_add/towers:1        5.30 us         5.27 us       132848
DCRT_add/towers:2        10.7 us         10.7 us        65424
DCRT_add/towers:4        22.7 us         22.6 us        30917
DCRT_add/towers:8        47.7 us         47.5 us        14722
Native_mul               1.91 us         1.91 us       366919
DCRT_mul/towers:1        2.06 us         2.05 us       341413
DCRT_mul/towers:2        4.29 us         4.28 us       163604
DCRT_mul/towers:4        10.0 us        10.00 us        70020
DCRT_mul/towers:8        22.8 us         22.7 us        30872
Native_ntt               8.35 us         8.32 us        84105
DCRT_ntt/towers:1        8.37 us         8.34 us        84062
DCRT_ntt/towers:2        17.0 us         16.9 us        41334
DCRT_ntt/towers:4        33.9 us         33.8 us        20678
DCRT_ntt/towers:8        67.8 us         67.6 us        10362
Native_intt              8.81 us         8.79 us        79758
DCRT_intt/towers:1       8.83 us         8.80 us        79497
DCRT_intt/towers:2       17.8 us         17.7 us        39483
DCRT_intt/towers:4       37.2 us         37.2 us        18843
DCRT_intt/towers:8       71.0 us         70.9 us         9910
Edited by Fabian Boemer

Merge request reports