Context

This MR calculates the t-value score for each parameters. More specifically,

Adds tvalues : (Free_variable.t * float) list field to the score of the solutions.
Adds T-value-<parameter-name> field to the CSV files.
Shows t-value in the plots and report pdfs.
Prints warning when the parameter estimated by the given method differs from estimated by OLS.
- This alert will help us to find problems caused by underdetermined systems. (e.g. #4922 (closed))

How to interpret t-value

T-value reflects the variance of the benchmark.

T-value is calculated based on the whole workload data. In contrast, R2 and RMSE are calculated based on the median of workload data.
T-value is calculated per parameter, while R2 and RMSE are calculated per benchmark.
T-value is proportional to the relative error of the parameter.
- By definition, t-value / (width of r% confidence interval / estimated value) only depends on r.
  - We can calculate confidence intervals based on the t-value and the estimated parameter value.
- Larger t-value is better.
  - If we increase bench-num or nsamples, t-value would become larger too.
  - T-value might be negative if the parameter is estimated to be negative in OLS. For that case, accuracy can be rated with its absolute value. (Or, negative estimation results might be counterintuitive and we need to fix benchmarks.)
- We can compare t-values between different benchmarks/parameters.
  - As I experimented on my laptop with bench-num=300 and nsamples=1000, I could find some issues (e.g. #4791 (closed), #4835 (closed)) for the parameters whose t-value is smaller than 2000.

Manually testing the MR

$ ./octez-snoop infer parameters for model interpreter on data ./workloads/ using lasso --lasso-positive --dump-csv result.csv --report report.tex --full-plot-verbosity

report.tex can be compiled with $ xelatex report.tex.

Checklist

Document the interface of any function added or modified (see the coding guidelines)
Document any change to the user interface, including configuration parameters (see node configuration)
Provide automatic testing (see the testing guide).
For new features and bug fixes, add an item in the appropriate changelog (docs/protocols/alpha.rst for the protocol and the environment, CHANGES.rst at the root of the repository for everything else).
Select suitable reviewers using the Reviewers field below.
Select as Assignee the next person who should take action on that MR

Closes #4130 (closed)

Edited Mar 03, 2023 by satos

Snoop: add benchmark quality score (t-value)

Context

How to interpret t-value

Manually testing the MR

Checklist

Merge request reports