Skip to content

Tenderbake: Optimizing round_and_offset using quadratic equations with the simplified datatype for round_durations

Hai Nguyen Van requested to merge hai@round_repr_simplify_round_and_offset into master

TL;DR

Fix #2224 (closed)

This MR improves the speed of round/offset calculation with a 5x speed-up based on the simplified datatype brought by MR !3984 (merged).

┌───────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name      │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ binsearch │ 248.50us │ 133.75kw │    8.09w │    8.09w │    100.00% │
│ sqrt      │  50.94us │  20.15kw │    0.43w │    0.43w │     20.50% │
└───────────┴──────────┴──────────┴──────────┴──────────┴────────────┘

Problem

We want to optimize function round_and_offset in module round_repr.ml to use an explicit solution driven by a quadratic equation. We denote:

  • Input: \tau_0 \in \mathbb{N}^*, the first round duration
  • Input: \tau_{\varepsilon} \in \mathbb{N}^*, the delay increment per round
  • Input: \text{level\_offset} \in \mathbb{R}^+
  • Find the round and its offset

Let us define the sequence \tau_k as the round duration of round k:

{(\tau_k)}_{k\in\mathbb N} ~:~ 
\left\{\begin{matrix}
\tau_0 &=& \tau_0\\ 
\tau_{k+1} &=& \tau_{k} + \tau_{\varepsilon}
\end{matrix}\right.

Example

\tau_0 \tau_{\varepsilon} \text{level\_offset} round its offset
30 15 32 1 2
30 15 75 2 0

The equation

From a level_offset, we want to find the greatest round such that the offset (w.r.t. to this round) is the least (and non-negative).

In other words, the equation is to maximize \text{round} such that \text{offset} \geq 0 and

\begin{aligned}
\sum_{k=0}^{\text{round} - 1} \tau_k + \text{offset} &= \text{level\_offset}
\end{aligned}

We deliberately minimize the \text{offset} to 0 to solve the equation in \text{round}

\begin{aligned}
\sum_{k=0}^{\text{round} - 1} \tau_k &= \text{level\_offset}
\end{aligned}

💡 It is possible to unfold the sum due to its commutative properties, we have

\begin{aligned}
\sum_{k=0}^{\text{round} - 1} \tau_k &= \tau_0 + (\tau + \tau_{\varepsilon}) + (\tau + 2 \tau_{\varepsilon}) + (\tau + 3\tau_{\varepsilon}) + \dots + (\tau + (\text{round} - 1) \times \tau_{\varepsilon}) \\
&= \text{round} \times \tau_0 + \left ( \sum_{k = 1}^{\text{round} - 1} k \right ) \tau_{\varepsilon}\\
&= \text{round} \times \tau_0 + \frac{(\text{round} - 1) \times \text{round}}{2}\times \tau_{\varepsilon}
\end{aligned}

The equation to solve is now explicited as

\begin{aligned}
\text{round} \times \tau_0 + \frac{(\text{round} - 1) \times \text{round}}{2} \times \tau_{\varepsilon} &= \text{level\_offset}
\end{aligned}

In order to further use a lemma below, we need to get rid of this division

\begin{aligned}
2 \times \text{round} \times \tau_0 + (\text{round} - 1) \times \text{round} \times \tau_{\varepsilon} &= 2 \times \text{level\_offset}
\end{aligned}

After expanding the terms, the quadratic equation in \text{round} is

\begin{aligned}
\tau_{\varepsilon} \times \text{round}^2 + \left ( 2 \tau_0 - \tau_{\varepsilon} \right ) \times \text{round} - 2 \times \text{level\_offset} &= 0
\end{aligned}

Thereby, the discriminant is

\begin{aligned}
\Delta = \left ( 2 \tau_0 - \tau_{\varepsilon} \right )^2 + 8 \times \tau_{\varepsilon} \times \text{level\_offset}
\end{aligned}

Hence,

\begin{aligned}
\text{round} = \frac{\tau_{\varepsilon} - 2 \tau_0 + \sqrt \Delta}{2 \tau_{\varepsilon}}
\end{aligned}

Once upon a time in the world of integer arithmetics

To recall, we want to compute in integer arithmetics. This means we should postpone floating-prone computations to top-level formulae. The problem now lies in the square root above \Delta which impossibly ensures the yielded value to be integer. Nevertheless, a sound approximation is possible using a lemma on quotients with floor.

Lemma (equation 3.11, Graham-Knuth-Patashnik, p. 72).

\forall x \in \mathbb R \quad \forall m \in \mathbb Z \quad \forall n \in \mathbb N^*
\quad \left \lfloor \frac{m + x}{n} \right \rfloor = \left \lfloor \frac{m + \left \lfloor x\right \rfloor}{n} \right \rfloor

From that we can state that,

\begin{aligned}
\left \lfloor \text{round} \right \rfloor &= \left \lfloor \frac{\tau_{\varepsilon} - 2 \tau_0 + \sqrt \Delta}{2 \tau_{\varepsilon}} \right \rfloor  \\
&= \left \lfloor \frac{\tau_{\varepsilon} - 2 \tau_0 + \left \lfloor \sqrt \Delta \right \rfloor}{2 \tau_{\varepsilon}} \right \rfloor
\end{aligned}

Finally, we can compute integer square-root \left \lfloor \sqrt \Delta \right \rfloor using either Newton-Raphson or 4-adic induction [Kreitz, 2003].

Space

For space-complexity, \Delta can grow really big but still linearly in O (\text{level\_offset}). We need to ensure that \Delta does not overflow 2^{63}-1.

\begin{aligned}
\Delta &\quad\leq\quad 2^{63} - 1\\
\left ( 2 \tau_0 - \tau_{\varepsilon} \right )^2 + 8 \times \tau_{\varepsilon} \times \text{level\_offset} &\quad\leq\quad 2^{63} - 1\\
\text{level\_offset} &\quad\leq\quad \frac{2^{63} - 1 - (2 \tau_0 - \tau_{\varepsilon})^2}{8\tau_{\varepsilon}}
\end{aligned}

If we have a \tau_{0} = 30 and \tau_{\varepsilon} = 15, this means that a reasonably large upper bound for \text{level\_offset} is 10^{16} ~ \text{sec}. In other words, 10^9 years. The risk of reaching the overflow is thereby low.

Time

In terms of time-complexity w.r.t. to \text{level\_offset}, is this optimal? Not everything is given for free and what costs the most is the integer square root operation \left \lfloor\sqrt{\Delta} \right \rfloor. The 4-adic induction and the Newton-Raphson method provide a logarithmic algorithm and this would make the solution O (\text{log}~ \text{level\_offset}). In this problem, we choose Newton-Raphson as 4-adic induction tends to reach overflow on call stack.

Benchmarks

All the following values were measured with @richard.bonichon's micro-benchmarks repository (using the core_bench library).

Initially, out first benchmarks were conducted on a uniformly distributed sample of values of \text{level\_offset} (ranging from 0 to 1000). The following table exhibits the results using a uniform distribution of sample and show a x5 speed-up.

┌───────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name      │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ binsearch │ 849.44us │ 244.80kw │   16.02w │   16.02w │    100.00% │
│ sqrt      │ 176.01us │  22.46kw │    0.72w │    0.72w │     20.72% │
└───────────┴──────────┴──────────┴──────────┴──────────┴────────────┘

In reality, we can argue that the values for this input are not uniformly distributed. Here is a chart of how values for \text{level\_offset} are distributed in practice during a 12-hour span:

level_offset_distribution

We repeat the benchmark experiment using the statistical distribution as the sample, and we measured the following values. We can observe that the x5 speed-up is preserved even for high incidence around input 0:

┌───────────┬──────────┬──────────┬──────────┬──────────┬────────────┐
│ Name      │ Time/Run │  mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├───────────┼──────────┼──────────┼──────────┼──────────┼────────────┤
│ binsearch │ 248.50us │ 133.75kw │    8.09w │    8.09w │    100.00% │
│ sqrt      │  50.94us │  20.15kw │    0.43w │    0.43w │     20.50% │
└───────────┴──────────┴──────────┴──────────┴──────────┴────────────┘

These experiments tend to show that this solution remain effective whether the distribution is uniform or Pareto.

Edited by Hai Nguyen Van

Merge request reports