Snoop: add benchmark quality score (t-value)
Context
Resolves #4130 (closed)
This MR calculates the t-value score for each parameters. More specifically,
- Adds
tvalues : (Free_variable.t * float) listfield to the score of the solutions. - Adds
T-value-<parameter-name>field to the CSV files. - Shows t-value in the plots and report pdfs.
- Prints warning when the parameter estimated by the given method differs from estimated by OLS.
- This alert will help us to find problems caused by underdetermined systems. (e.g. #4922 (closed))
How to interpret t-value
T-value reflects the variance of the benchmark.
- T-value is calculated based on the whole workload data. In contrast, R2 and RMSE are calculated based on the median of workload data.
- T-value is calculated per parameter, while R2 and RMSE are calculated per benchmark.
- T-value is proportional to the relative error of the parameter.
- By definition,
t-value / (width of r% confidence interval / estimated value)only depends onr.- We can calculate confidence intervals based on the t-value and the estimated parameter value.
- Larger t-value is better.
- If we increase
bench-numornsamples, t-value would become larger too. - T-value might be negative if the parameter is estimated to be negative in OLS. For that case, accuracy can be rated with its absolute value. (Or, negative estimation results might be counterintuitive and we need to fix benchmarks.)
- If we increase
- We can compare t-values between different benchmarks/parameters.
- As I experimented on my laptop with
bench-num=300andnsamples=1000, I could find some issues (e.g. #4791 (closed), #4835 (closed)) for the parameters whose t-value is smaller than 2000.
- As I experimented on my laptop with
- By definition,
Manually testing the MR
$ ./octez-snoop infer parameters for model interpreter on data ./workloads/ using lasso --lasso-positive --dump-csv result.csv --report report.tex --full-plot-verbosity
report.tex can be compiled with $ xelatex report.tex.
Checklist
-
Document the interface of any function added or modified (see the coding guidelines) -
Document any change to the user interface, including configuration parameters (see node configuration) -
Provide automatic testing (see the testing guide). -
For new features and bug fixes, add an item in the appropriate changelog ( docs/protocols/alpha.rstfor the protocol and the environment,CHANGES.rstat the root of the repository for everything else). -
Select suitable reviewers using the Reviewersfield below. -
Select as Assigneethe next person who should take action on that MR
Closes #4130 (closed)
Edited by satos