F* IMT Spec: Protocol tests

Testing the protocol Sapling implementation

Given that we successfully verified and tested the IMT ADT implementation from lib_sapling, we would like to make use of that specification in order to test the key-value store based implementation of IMTs found in the protocol (lib_protocol/spaling_storage.ml).

There are several approaches to do so. To introduce them, let’s first present the following diagram:

   Protocol    |   ADT (lib_sapling)
             proj0
     t0  ──────|──────► t0'
      │                  │
insert│        |         │insert'
      │                  │
      ▼        |         ▼
     t1  ─────────────► t1'
             proj1
               |

This diagram is divided in two by the vertical dotted line, that delimits the protocol IMTs to the ADT IMTs. For instance, t0 stands for a context in which an IMT is stored, following the representation defined in sapling_storage.ml. Here, proj0 and proj1 are projection functions that take a tree stored in a context to a tree ADT, and insert' is the IMT insertion function from our ADT spec (insert_list).

The function that we want to test is insert, the protocol IMT insertion (actually called add). One way of testing it is to test the commutativity of this diagram. This means, to check that, for any given t0, proj1 (insert t0) is equal to insert' (proj0 t0). From now on, we'll refer to this test as Test A.

Another way of testing insert is to check if the satisfies the insertion specification extracted from the ADT model. Concretely, this can be done by checking that pre_insert t0' ==> post_insert t1'. A benefit of this test, which we'll call Test B, is that we don’t have to compute insert' t0'. However, computing the extracted pre and post-conditions is expensive, as each condition traverses the tree multiple times.

Benchmarks

To compare these tests' performance, we run the following simple benchmark:

We generate a context that stores a tree (t0) with N leaves. We vary this N to see how the size of the tree impacts performance.
We run Test A.
We run Test B.

For steps 2 and 3, we insert 500 random commitments to each tree. As we use the protocol's high-level interface, we only test on trees of height 32.

From this benchmark, we collect the total execution time, what percentage of the running time is spent in each of the 3 steps, and what percentage of the execution of the test is spent in each of their components. To get this information we use the Landmarks library. This benchmark can be found in proto-bench.ml.

Results for `N` = 1000

For small trees, with 1000 leaves, running the benchmark took on average 0.37 seconds. The breakdown of how much time each component took is presented in the following table:

[  351.33M cycles in 1 calls ]     - 40.71% : test_b
[  151.75M cycles in 1 calls ]     |   - 43.19% : post
[  100.46M cycles in 1 calls ]     |   - 28.59% : pre
[   70.89M cycles in 1 calls ]     |   - 20.18% : insert
[   16.13M cycles in 1 calls ]     |   -  4.59% : proj1
[   12.06M cycles in 1 calls ]     |   -  3.43% : proj0
[  181.12M cycles in 1 calls ]     - 20.99% : make_context
[  149.92M cycles in 1 calls ]     - 17.37% : test_a
[   67.60M cycles in 1 calls ]     |   - 45.09% : insert
[   52.87M cycles in 1 calls ]     |   - 35.27% : insert'
[   16.43M cycles in 1 calls ]     |   - 10.96% : proj1
[   12.67M cycles in 1 calls ]     |   -  8.45% : proj0

In this case, Test B takes more than twice as long as Test A. This is because most of the runtime of Test B is spent computing the pre and post-conditions. Components that took less than 1% (such as the tree equality check from Test A) have been omitted.

Results for `N` = 100,000

For large(r) trees, with 100k leaves (the largest size we checked on the ADT tests), running the benchmark took on average 17.3 seconds. The breakdown of how much time each component took is presented in the following table:

[   22.99G cycles in   1 calls ]     - 57.63% : test_b
[   10.50G cycles in   1 calls ]     |   - 45.69% : post
[    9.85G cycles in   1 calls ]     |   - 42.84% : pre
[    1.33G cycles in   1 calls ]     |   -  5.79% : proj1
[    1.23G cycles in   1 calls ]     |   -  5.37% : proj0
[   14.06G cycles in   1 calls ]     - 35.25% : make_context
[    2.67G cycles in   1 calls ]     -  6.69% : test_a
[    1.27G cycles in   1 calls ]     |   - 47.50% : proj0
[    1.25G cycles in   1 calls ]     |   - 46.90% : proj1
[   75.20M cycles in   1 calls ]     |   -  2.82% : insert
[   52.97M cycles in   1 calls ]     |   -  1.98% : insert'

Now, Test B takes almost 10 times as long as Test A, showing that the former's runtime increases more rapidly as the size of the input tree increases. It's also interesting to note that Test A only takes a fraction of the time that generating the random context takes. Additionally, it's also worth mentioning that in Test A, most of the time is spent computing the projections, while both insertions take less than 5% of its runtime.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information

F* IMT Spec: Protocol tests

Testing the protocol Sapling implementation

Benchmarks

Results for N = 1000

Results for N = 100,000

Results for `N` = 1000

Results for `N` = 100,000