Skip to content

Add blst backend

Danny Willems requested to merge add-blst-backend into master

This MR implements a new backend for the virtual package bls12-381 based on blst.

blst (pronounced 'blast') is a BLS12-381 signature library focused on performance and security. It is written in C and assembly.

The goal of the blst library is to provide a foundational component for applications and other libraries that require high performance and formally verified BLS12-381 operations. With that in mind some decisions are made to maximize the public good beyond BLS12-381. For example, the field operations are optimized for general 384-bit usage, as opposed to tuned specifically for the 381-bit BLS12-381 curve parameters. With the formal verification of these foundational components, we believe they can provide a reliable building block for other curves that would like high performance and an extra element of security.

The library deliberately abstains from dealing with memory management and multi-threading, with the rationale that these ultimately belong in language-specific bindings. Another responsibility that is left to application is random number generation. All this in the name of run-time neutrality, which makes integration into more stringent environments like Intel SGX or ARM TrustZone trivial.

A new package is implemented, called bls12-381-unix-blst. bls12-381-unix is still the Rust backend.

General notes and design

The binding is based on the commit 095a8c53787d6c91b725152ebfbbf33acf05a931. The library supposes the header and the .a is installed in $OPAM_SWITCH_PREFIX/lib/blst. Here the command to install in your switch:

mkdir -p $OPAM_SWITCH_PREFIX/lib/blst
wget https://github.com/supranational/blst/archive/095a8c53787d6c91b725152ebfbbf33acf05a931.zip
unzip 095a8c53787d6c91b725152ebfbbf33acf05a931.zip
cd blst-095a8c53787d6c91b725152ebfbbf33acf05a931
./build.sh
cp libblst.a $OPAM_SWITCH_PREFIX/lib/blst
cp bindings/*.h $OPAM_SWITCH_PREFIX/lib/blst
cd ../

In the Rust binding, types t in G1, G2, Fr, and Fq12 are bytes and (de)serialisations are required. In this binding, types t are C pointers and no (de)serialisation is required. The invariant is still the same. A value of type G1.t (resp G2.t) is a point in the prime subgroup. Values of type Fr.t are between 0 and Fr.order - 1. For G1 and G2, there is a test for the random generator verifying this invariant, see https://gitlab.com/dannywillems/ocaml-bls12-381/-/blob/15bb2627d290e106324a9af04f72c07211b8a7e9/test/test_ec_make.ml#L214.

The other ways of building a type t are with

About the binding structure

The low level binding is available in the directory bindings.

blst uses typedef'ed anonymous structures for each object. Here their definitions:

typedef uint8_t byte;
typedef uint64_t limb_t;

typedef struct { byte b[256/8]; } blst_scalar;
typedef struct { limb_t l[256/8/sizeof(limb_t)]; } blst_fr;
typedef struct { limb_t l[384/8/sizeof(limb_t)]; } blst_fp;
/* 0 is "real" part, 1 is "imaginary" */
typedef struct { blst_fp fp[2]; } blst_fp2;
typedef struct { blst_fp2 fp2[3]; } blst_fp6;
typedef struct { blst_fp6 fp6[2]; } blst_fp12;

typedef struct { blst_fp x, y, z; } blst_p1;
typedef struct { blst_fp x, y; } blst_p1_affine;

typedef struct { blst_fp2 x, y, z; } blst_p2;
typedef struct { blst_fp2 x, y; } blst_p2_affine;

ocaml-ctypes is used for the binding. Here how blst_fr_t and blst_p1 are bound in Ctypes:

let blst_fr_t : blst_fr_t Ctypes.typ =
  let nb_limb = 256 / (8 * sizeof_limb_t) in
  let l = Ctypes.array nb_limb limb_t in
  let anon_structure = Ctypes.structure "" in
  let _ = Ctypes.field anon_structure "l" l in
  let x = Ctypes.(typedef anon_structure "blst_fr") in
  Ctypes.seal x ;
  x

let blst_g1_t_compo =
  let anon_structure = Ctypes.structure "" in
  let x_coord = Ctypes.field anon_structure "x" blst_fq_t in
  let y_coord = Ctypes.field anon_structure "y" blst_fq_t in
  let z_coord = Ctypes.field anon_structure "z" blst_fq_t in
  let x = Ctypes.(typedef anon_structure "blst_p1") in
  Ctypes.seal x ;
  ((x_coord, y_coord, z_coord), x)

let blst_g1_t : blst_g1_t Ctypes.typ =
  let (_, x) = blst_g1_t_compo in
  x

Utils functions like g1_affine_set_x are also implemented to deal with the inner structure. The inner structure (like x_coord, y_coord, etc) are kept as values to avoid recomputing them.

The ctypes boilerplate binding the structures are in Blst_bindings.Types. The bindings are generated in blst_gen.ml. Following the same strategy than the Rust binding, each object has its own submodule in blst_bindings.

About BLS signatures

« hash_to_curve » primitives are available here and therefore can be bound later. See this issue.

About testing

The same tests are used for each backend, relying on the dune virtual/implementation libraries. It gives more confidence in the consistency between the two libraries. The tests include regression tests.

The library has been tested in different repositories where bls12-381-unix is used:

For each, you will need to install blst as described above and pin the repository:

opam pin add bls12-381-gen git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381 git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381-unix git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action
opam pin add bls12-381-unix-blst git+https://gitlab.com/dannywillems/ocaml-bls12-381.git#add-blst-backend --no-action

opam reinstall bls12-381-gen
opam reinstall bls12-381
opam reinstall bls12-381-unix
opam reinstall bls12-381-unix-blst

About performance

See benchmark directory. Change the library in the dune file with the backend you want to bench. Benchmarks on Intel(R) Core(TM) i5-6300HQ CPU @ 2.30GHz, 2x8192MB RAM, 2133 MT/s

  • bls12-381-unix
┌──────────────────────┬────────────────┬─────────┬────────────┐
│ Name                 │       Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼────────────────┼─────────┼────────────┤
│ Multiplication on G2 │ 1_079_399.47ns │  38.00w │    100.00% │
│ Double on G2         │    13_137.01ns │  34.00w │      1.22% │
│ Addition on G2       │    15_881.07ns │  38.00w │      1.47% │
│ Negate on G2         │       722.34ns │  34.00w │      0.07% │
│ of_bytes_exn on G2   │   919_363.65ns │  30.00w │     85.17% │
│ to_bytes on G2       │         3.69ns │         │            │
└──────────────────────┴────────────────┴─────────┴────────────┘

┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name                 │     Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G1 │ 340_032.14ns │  26.00w │    100.00% │
│ Double on G1         │  10_515.54ns │  22.00w │      3.09% │
│ Addition on G1       │  11_429.99ns │  26.00w │      3.36% │
│ Negate on G1         │     395.03ns │  22.00w │      0.12% │
│ of_bytes_exn on G1   │ 273_873.73ns │  18.00w │     80.54% │
│ to_bytes on G1       │       3.75ns │         │            │
└──────────────────────┴──────────────┴─────────┴────────────┘

┌───────────────────┬─────────────┬─────────┬────────────┐
│ Name              │    Time/Run │ mWd/Run │ Percentage │
├───────────────────┼─────────────┼─────────┼────────────┤
│ Addition Fr       │    192.38ns │  18.00w │      1.79% │
│ Multiplication Fr │    220.07ns │  18.00w │      2.04% │
│ Opposite Fr       │    128.87ns │  14.00w │      1.20% │
│ Substraction Fr   │    319.73ns │  32.00w │      2.97% │
│ Square Fr         │    159.83ns │  14.00w │      1.48% │
│ Inverse Fr        │  3_633.28ns │  14.00w │     33.72% │
│ Pow Fr            │ 10_774.88ns │  63.00w │    100.00% │
│ Double Fr         │    129.03ns │  14.00w │      1.20% │
│ of_bytes_exn Fr   │     79.34ns │  10.00w │      0.74% │
│ to_bytes Fr       │      3.71ns │         │      0.03% │
└───────────────────┴─────────────┴─────────┴────────────┘


┌───────────────────────────────────────────────────────────────────────┬──────────┬─────────┬──────────┬──────────┬────────────┐
│ Name                                                                  │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────────────────────────────────────────────────────────────────┼──────────┼─────────┼──────────┼──────────┼────────────┤
│ Pairing                                                               │   2.76ms │ 164.00w │          │          │     48.63% │
│ Miller loop simple                                                    │   2.73ms │ 164.00w │          │          │     48.11% │
│ Final exponentiation                                                  │   1.82ms │ 164.00w │          │          │     32.03% │
│ Miller loop on 6 couples of points                                    │   3.93ms │ 299.02w │          │          │     69.28% │
│ Miller loop on 6 couples of points followed by a final exponentiation │   5.67ms │ 463.04w │    0.25w │    0.25w │    100.00% │
└───────────────────────────────────────────────────────────────────────┴──────────┴─────────┴──────────┴──────────┴────────────┘
  • bls12-381-unix-blst
┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name                 │     Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G2 │ 239_727.21ns │  53.00w │    100.00% │
│ Double on G2         │   1_164.39ns │  18.00w │      0.49% │
│ Addition on G2       │   2_766.12ns │  18.00w │      1.15% │
│ Negate on G2         │     390.98ns │ 198.01w │      0.16% │
│ of_bytes_exn on G2   │  81_355.56ns │  42.00w │     33.94% │
│ to_bytes on G2       │   4_636.03ns │  30.00w │      1.93% │
└──────────────────────┴──────────────┴─────────┴────────────┘

┌──────────────────────┬──────────────┬─────────┬────────────┐
│ Name                 │     Time/Run │ mWd/Run │ Percentage │
├──────────────────────┼──────────────┼─────────┼────────────┤
│ Multiplication on G1 │ 113_901.72ns │  53.00w │    100.00% │
│ Double on G1         │     531.69ns │  18.00w │      0.47% │
│ Addition on G1       │   1_022.95ns │  18.00w │      0.90% │
│ Negate on G1         │     346.89ns │ 198.01w │      0.30% │
│ of_bytes_exn on G1   │  69_471.22ns │  42.00w │     60.99% │
│ to_bytes on G1       │   3_761.96ns │  18.00w │      3.30% │
└──────────────────────┴──────────────┴─────────┴────────────┘

┌───────────────────┬──────────────┬────────────┬──────────┬──────────┬────────────┐
│ Name              │     Time/Run │    mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────────────┼──────────────┼────────────┼──────────┼──────────┼────────────┤
│ Addition Fr       │      88.66ns │     18.00w │          │          │      0.05% │
│ Multiplication Fr │     108.70ns │     18.00w │          │          │      0.06% │
│ Opposite Fr       │      93.78ns │     18.00w │          │          │      0.06% │
│ Substraction Fr   │      90.14ns │     18.00w │          │          │      0.05% │
│ Square Fr         │     104.29ns │     18.00w │          │          │      0.06% │
│ Inverse Fr        │   2_525.83ns │     76.00w │          │          │      1.50% │
│ Pow Fr            │ 168_266.63ns │ 27_686.63w │  115.35w │  115.35w │    100.00% │
│ Double Fr         │      90.81ns │     18.00w │          │          │      0.05% │
│ of_bytes_exn Fr   │     294.57ns │     48.00w │          │          │      0.18% │
│ to_bytes Fr       │     179.96ns │     28.00w │          │          │      0.11% │
└───────────────────┴──────────────┴────────────┴──────────┴──────────┴────────────┘

┌──────────────────────────────────────────────────────────────────────┬────────────┬─────────┬────────────┐
│ Name                                                                 │   Time/Run │ mWd/Run │ Percentage │
├──────────────────────────────────────────────────────────────────────┼────────────┼─────────┼────────────┤
│ Pairing                                                              │   843.40us │  72.00w │     31.10% │
│ Miller loop simple                                                   │   842.27us │  72.00w │     31.05% │
│ Final exponentiation                                                 │   474.85us │  18.00w │     17.51% │
│ Miller loop on 6 couple of points                                    │ 2_181.32us │ 432.02w │     80.42% │
│ Miller loop on 6 couple of points followed by a final exponentiation │ 2_712.25us │ 450.02w │    100.00% │
└──────────────────────────────────────────────────────────────────────┴────────────┴─────────┴────────────┘

What can be improved

  • Cleaner src/blst/bindings/blst_bindings.ml.
  • Fr.pow is not implemented efficiently, see the bench above. A simple fast exponentiation is used, but requires a lot of allocation.
  • Fr.to_bytes and Fr.of_bytes_exn are less efficient than the Rust backend because a montgomery form is used in the type blst_fr (not the case for blst_scalar which is simply the little endian representation).
  • G1.to_bytes/G2.to_bytes cost more with the blst backend because computation is required, which is not the case in the Rust binding.
Edited by Danny Willems

Merge request reports