Add blst backend
This MR implements a new backend for the virtual package bls12-381
based on blst.
blst (pronounced 'blast') is a BLS12-381 signature library focused on performance and security. It is written in C and assembly.
The goal of the blst library is to provide a foundational component for applications and other libraries that require high performance and formally verified BLS12-381 operations. With that in mind some decisions are made to maximize the public good beyond BLS12-381. For example, the field operations are optimized for general 384-bit usage, as opposed to tuned specifically for the 381-bit BLS12-381 curve parameters. With the formal verification of these foundational components, we believe they can provide a reliable building block for other curves that would like high performance and an extra element of security.
The library deliberately abstains from dealing with memory management and multi-threading, with the rationale that these ultimately belong in language-specific bindings. Another responsibility that is left to application is random number generation. All this in the name of run-time neutrality, which makes integration into more stringent environments like Intel SGX or ARM TrustZone trivial.
A new package is implemented, called bls12-381-unix-blst
. bls12-381-unix
is still the Rust backend.
General notes and design
The binding is based on the commit 095a8c53787d6c91b725152ebfbbf33acf05a931. The library supposes the header and the .a
is installed in $OPAM_SWITCH_PREFIX/lib/blst
. Here the command to install in your switch:
mkdir -p $OPAM_SWITCH_PREFIX/lib/blst
wget https://github.com/supranational/blst/archive/095a8c53787d6c91b725152ebfbbf33acf05a931.zip
unzip 095a8c53787d6c91b725152ebfbbf33acf05a931.zip
cd blst-095a8c53787d6c91b725152ebfbbf33acf05a931
./build.sh
cp libblst.a $OPAM_SWITCH_PREFIX/lib/blst
cp bindings/*.h $OPAM_SWITCH_PREFIX/lib/blst
cd ../
In the Rust binding, types t
in G1
, G2
, Fr
, and Fq12
are bytes and (de)serialisations are required.
In this binding, types t
are C pointers and no (de)serialisation is required.
The invariant is still the same. A value of type G1.t
(resp G2.t
) is a point in the prime subgroup.
Values of type Fr.t
are between 0
and Fr.order - 1
. For G1
and G2
, there is a test for the random generator verifying this invariant, see https://gitlab.com/dannywillems/ocaml-bls12-381/-/blob/15bb2627d290e106324a9af04f72c07211b8a7e9/test/test_ec_make.ml#L214.
The other ways of building a type t
are with
-
of_bytes_exn/of_bytes_opt
, and the implementation verifies the point is in the prime subgroup (see the function bound https://github.com/supranational/blst/blob/095a8c53787d6c91b725152ebfbbf33acf05a931/src/map_to_g1.c#L499). Additional tests have been added for both G1 and G2, see https://gitlab.com/dannywillems/ocaml-bls12-381/-/blob/add-blst-backend/test/test_g1.ml#L189 and https://gitlab.com/dannywillems/ocaml-bls12-381/-/blob/add-blst-backend/test/test_g2.ml#L77. -
of_z_opt
. For G1, the implementation usesof_bytes_opt
, and forG2
, there is a check before the final output.
About the binding structure
The low level binding is available in the directory bindings
.
blst uses typedef'ed anonymous structures for each object. Here their definitions:
typedef uint8_t byte;
typedef uint64_t limb_t;
typedef struct { byte b[256/8]; } blst_scalar;
typedef struct { limb_t l[256/8/sizeof(limb_t)]; } blst_fr;
typedef struct { limb_t l[384/8/sizeof(limb_t)]; } blst_fp;
/* 0 is "real" part, 1 is "imaginary" */
typedef struct { blst_fp fp[2]; } blst_fp2;
typedef struct { blst_fp2 fp2[3]; } blst_fp6;
typedef struct { blst_fp6 fp6[2]; } blst_fp12;
typedef struct { blst_fp x, y, z; } blst_p1;
typedef struct { blst_fp x, y; } blst_p1_affine;
typedef struct { blst_fp2 x, y, z; } blst_p2;
typedef struct { blst_fp2 x, y; } blst_p2_affine;
ocaml-ctypes is used for the binding. Here how blst_fr_t
and blst_p1 are bound in Ctypes:
let blst_fr_t : blst_fr_t Ctypes.typ =
let nb_limb = 256 / (8 * sizeof_limb_t) in
let l = Ctypes.array nb_limb limb_t in
let anon_structure = Ctypes.structure "" in
let _ = Ctypes.field anon_structure "l" l in
let x = Ctypes.(typedef anon_structure "blst_fr") in
Ctypes.seal x ;
x
let blst_g1_t_compo =
let anon_structure = Ctypes.structure "" in
let x_coord = Ctypes.field anon_structure "x" blst_fq_t in
let y_coord = Ctypes.field anon_structure "y" blst_fq_t in
let z_coord = Ctypes.field anon_structure "z" blst_fq_t in
let x = Ctypes.(typedef anon_structure "blst_p1") in
Ctypes.seal x ;
((x_coord, y_coord, z_coord), x)
let blst_g1_t : blst_g1_t Ctypes.typ =
let (_, x) = blst_g1_t_compo in
x
Utils functions like g1_affine_set_x
are also implemented to deal with the inner structure. The inner structure (like x_coord
, y_coord
, etc) are kept as values to avoid recomputing them.
The ctypes boilerplate binding the structures are in Blst_bindings.Types
. The bindings are generated in blst_gen.ml
. Following the same strategy than the Rust binding, each object has its own submodule in blst_bindings
.
About BLS signatures
« hash_to_curve » primitives are available here and therefore can be bound later. See this issue.
About testing
The same tests are used for each backend, relying on the dune virtual/implementation libraries. It gives more confidence in the consistency between the two libraries. The tests include regression tests.
The library has been tested in different repositories where bls12-381-unix
is used:
-
Tezos, see the branch dannywillems-use-blst-backend. You can run the tests related to bls12-381 with
poetry run pytest tests_alpha/test_contract_bls12_381.py
andpoetry run pytest tests_alpha/test_contract_opcodes.py
. -
Nomadic Labs privacy team, see the branch remove-polymorphic-eq.
dune build @runtest
should does the job. - ocaml-polynomial for the scalar field, see do-not-use-polymorphic-arithmetic-add-blst-backend.
For each, you will need to install blst as described above and pin the repository:
opam pin add bls12-381-gen git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381 git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381-unix git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381-unix-blst git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam reinstall bls12-381-gen
opam reinstall bls12-381
opam reinstall bls12-381-unix
opam reinstall bls12-381-unix-blst
About performance
See benchmark
directory. Change the library in the dune file with the backend you want to bench.
Benchmarks on Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz, 2x8192MB RAM, 2133 MT/s
bls12-381-unix
┌──────────────────────┬────────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼────────────────┼─────────┼────────────┤
│ Multiplication on G2 │ 1_079_399.47ns │ 38.00w │ 100.00% │
│ Double on G2 │ 13_137.01ns │ 34.00w │ 1.22% │
│ Addition on G2 │ 15_881.07ns │ 38.00w │ 1.47% │
│ Negate on G2 │ 722.34ns │ 34.00w │ 0.07% │
│ of_bytes_exn on G2 │ 919_363.65ns │ 30.00w │ 85.17% │
│ to_bytes on G2 │ 3.69ns │ │ │
└──────────────────────┴────────────────┴─────────┴────────────┘
┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G1 │ 340_032.14ns │ 26.00w │ 100.00% │
│ Double on G1 │ 10_515.54ns │ 22.00w │ 3.09% │
│ Addition on G1 │ 11_429.99ns │ 26.00w │ 3.36% │
│ Negate on G1 │ 395.03ns │ 22.00w │ 0.12% │
│ of_bytes_exn on G1 │ 273_873.73ns │ 18.00w │ 80.54% │
│ to_bytes on G1 │ 3.75ns │ │ │
└──────────────────────┴──────────────┴─────────┴────────────┘
┌───────────────────┬─────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├───────────────────┼─────────────┼─────────┼────────────┤
│ Addition Fr │ 192.38ns │ 18.00w │ 1.79% │
│ Multiplication Fr │ 220.07ns │ 18.00w │ 2.04% │
│ Opposite Fr │ 128.87ns │ 14.00w │ 1.20% │
│ Substraction Fr │ 319.73ns │ 32.00w │ 2.97% │
│ Square Fr │ 159.83ns │ 14.00w │ 1.48% │
│ Inverse Fr │ 3_633.28ns │ 14.00w │ 33.72% │
│ Pow Fr │ 10_774.88ns │ 63.00w │ 100.00% │
│ Double Fr │ 129.03ns │ 14.00w │ 1.20% │
│ of_bytes_exn Fr │ 79.34ns │ 10.00w │ 0.74% │
│ to_bytes Fr │ 3.71ns │ │ 0.03% │
└───────────────────┴─────────────┴─────────┴────────────┘
┌───────────────────────────────────────────────────────────────────────┬──────────┬─────────┬──────────┬──────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────────────────────────────────────────────────────────────────┼──────────┼─────────┼──────────┼──────────┼────────────┤
│ Pairing │ 2.76ms │ 164.00w │ │ │ 48.63% │
│ Miller loop simple │ 2.73ms │ 164.00w │ │ │ 48.11% │
│ Final exponentiation │ 1.82ms │ 164.00w │ │ │ 32.03% │
│ Miller loop on 6 couples of points │ 3.93ms │ 299.02w │ │ │ 69.28% │
│ Miller loop on 6 couples of points followed by a final exponentiation │ 5.67ms │ 463.04w │ 0.25w │ 0.25w │ 100.00% │
└───────────────────────────────────────────────────────────────────────┴──────────┴─────────┴──────────┴──────────┴────────────┘
bls12-381-unix-blst
┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G2 │ 239_727.21ns │ 53.00w │ 100.00% │
│ Double on G2 │ 1_164.39ns │ 18.00w │ 0.49% │
│ Addition on G2 │ 2_766.12ns │ 18.00w │ 1.15% │
│ Negate on G2 │ 390.98ns │ 198.01w │ 0.16% │
│ of_bytes_exn on G2 │ 81_355.56ns │ 42.00w │ 33.94% │
│ to_bytes on G2 │ 4_636.03ns │ 30.00w │ 1.93% │
└──────────────────────┴──────────────┴─────────┴────────────┘
┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G1 │ 113_901.72ns │ 53.00w │ 100.00% │
│ Double on G1 │ 531.69ns │ 18.00w │ 0.47% │
│ Addition on G1 │ 1_022.95ns │ 18.00w │ 0.90% │
│ Negate on G1 │ 346.89ns │ 198.01w │ 0.30% │
│ of_bytes_exn on G1 │ 69_471.22ns │ 42.00w │ 60.99% │
│ to_bytes on G1 │ 3_761.96ns │ 18.00w │ 3.30% │
└──────────────────────┴──────────────┴─────────┴────────────┘
┌───────────────────┬──────────────┬────────────┬──────────┬──────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────────────┼──────────────┼────────────┼──────────┼──────────┼────────────┤
│ Addition Fr │ 88.66ns │ 18.00w │ │ │ 0.05% │
│ Multiplication Fr │ 108.70ns │ 18.00w │ │ │ 0.06% │
│ Opposite Fr │ 93.78ns │ 18.00w │ │ │ 0.06% │
│ Substraction Fr │ 90.14ns │ 18.00w │ │ │ 0.05% │
│ Square Fr │ 104.29ns │ 18.00w │ │ │ 0.06% │
│ Inverse Fr │ 2_525.83ns │ 76.00w │ │ │ 1.50% │
│ Pow Fr │ 168_266.63ns │ 27_686.63w │ 115.35w │ 115.35w │ 100.00% │
│ Double Fr │ 90.81ns │ 18.00w │ │ │ 0.05% │
│ of_bytes_exn Fr │ 294.57ns │ 48.00w │ │ │ 0.18% │
│ to_bytes Fr │ 179.96ns │ 28.00w │ │ │ 0.11% │
└───────────────────┴──────────────┴────────────┴──────────┴──────────┴────────────┘
┌──────────────────────────────────────────────────────────────────────┬────────────┬─────────┬────────────┐
│ Name │ Time/Run │ mWd/Run │ Percentage │
├──────────────────────────────────────────────────────────────────────┼────────────┼─────────┼────────────┤
│ Pairing │ 843.40us │ 72.00w │ 31.10% │
│ Miller loop simple │ 842.27us │ 72.00w │ 31.05% │
│ Final exponentiation │ 474.85us │ 18.00w │ 17.51% │
│ Miller loop on 6 couple of points │ 2_181.32us │ 432.02w │ 80.42% │
│ Miller loop on 6 couple of points followed by a final exponentiation │ 2_712.25us │ 450.02w │ 100.00% │
└──────────────────────────────────────────────────────────────────────┴────────────┴─────────┴────────────┘
What can be improved
-
Cleaner src/blst/bindings/blst_bindings.ml
. -
Fr.pow
is not implemented efficiently, see the bench above. A simple fast exponentiation is used, but requires a lot of allocation. -
Fr.to_bytes
andFr.of_bytes_exn
are less efficient than the Rust backend because a montgomery form is used in the typeblst_fr
(not the case forblst_scalar
which is simply the little endian representation). -
G1.to_bytes
/G2.to_bytes
cost more with the blst backend because computation is required, which is not the case in the Rust binding.