Ilias Garnier requested to merge igarnier-snoop-bayes into master Apr 11, 2022

Context

Snoop is used to perform benchmarks and infer parameters for cost models. During the benchmark process, each workload triggers a set of nsamples benchmarks, yielding an empirical timing distribution for this workload. Currently, we "determinize" each empirical timing distribution by taking (by default) the median execution time as representative of the associated workload. Parameter inference is performed afterwards. For instance, in the following plot each blue dot corresponds to the median out of some number of samples; the red line is the predicted execution time after fitting the cost model. (This corresponds to the benchmark for Set.elements.)

However, taking the median drops important information about the empirical timing distribution, such as its standard deviation, or the extents of its support. In particular, it hides away much of the GC induced pauses but also the intrinsic noise (whether software or hardware related) associated to the execution time of some pieces of code. During inference, snoop uses the value of some variables to guess the value of other ones. Uncertainty is pushed forward in this process of combining models, but we currently totally ignore this fact. We have no way of quantifying the quality of the estimates obtained downstream of this propagation process, because we throw the relevant data just after benchmarking.

This MR is a first step towards exploiting the full distribution for solving this issue (to keep things reasonably small we don't perform uncertainty propagation in this MR)

Here are the changes.

Measures

The full timing data is now saved in workload files. Taking statistics (such as the median) is delayed at inference time (before or after inference depending on the solver).
Additional options are given to plot that empirical data. By default, the median is displayed but it is also possible to plot the full data using the --empirical-plot full option of the infer command

One can also plot arbitrary quantiles simultaneously, eg the 0.1 and 0.9th using --empirical-plot 0.1,0.9

The infer command also has an additional option to write each timing distribution to a target directory in a separate file using --plot --plot-raw-workload /target/directory, here's the full distribution for a given workload Set_elements_alpha_Set_elements_raw-286.pdf. This is mostly useful to investigate problematic benchmarks (eg featuring bimodal timing distributions)

Bayesian linear regression

An algorithm for bayesian linear regression (blr for short) is added. The underlying model assumes that for each workload, most samples are near the (by assumption, unique) mode and that remaining data consists of outliers. More precisely, the model is a mixture of a gaussian with an unknown mean and standard deviation and of a uniform distribution over the data far enough of the mean (ie we don't try to model the distribution of outliers).

In contrast with other solvers, the algorithm produces an estimate of how much data was used to perform the estimation as a "confidence" distribution (it can be interpreted as the proportion of mass near the mode vs in the outliers). It also produces an estimate of the standard deviation of the data around the mode.

Most importantly, it outputs full posteriors for free variables (instead of point estimates) allowing to quantify the uncertainty of these estimates. For instance:

./tezos-snoop benchmark Blake2b_example and save to ./blake2.workload --nsamples 300
./tezos-snoop infer parameters for model blake2b on data blake2.workload using blr --blr-burn-in 5000 --blr-samples 10000 -blr-seed 1789667732 --plot

yields the following posteriors over the two parameters of the model:

The variable blake2b-ns-p-byte corresponds to the slope of the line and the posterior is sharply concentrated around 1.11 while the intercept blake2b-const has a much wider support.

This solver should be considered as an experimental feature for now. Its robustness needs to be assessed in the wild.

Manually testing the MR

Play with the tool as above

Checklist

Check that snoop tezt scripts still work ok using lasso (and perhaps using blr?)
Document the interface of any function added or modified (see the coding guidelines)
Document any change to the user interface, including configuration parameters (see node configuration)
Provide automatic testing (see the testing guide).
For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
Select suitable reviewers using the Reviewers field below.
Select as Assignee the next person who should take action on that MR

Edited Jun 13, 2022 by Arvid Jakobsson

Draft: Snoop: dealing with uncertainty

Context

Measures

Bayesian linear regression

Manually testing the MR

Checklist

Merge request reports