Multivariate validation > Build Bayesian distribution for each product
New Validation
As data users would like to see the statistical distributions, we should provide joint Bayesian probability distribution for multiple variables from the HESTIA database.
Given HESTIA data as the 'observations', from the prior distribution, we should be able to calculate the posterior distribution for each product, globally. Climate data, such as temperature and precipitation, should be added as additional variables.
- If I specify a country name, a product name from the HESTIA glossary,
-
The (univariate) posterior distribution of a few common variables (e.g. yield, fertiliser use, pesticide use, irrigation water use) should be returned quantitatively (and with a visual output upon request). -
The bivariate distribution between any two of the above-mentioned variables should be returned -
The multivariate distribution between yield and other variables should be returned
Validation
The validation will be based on likelihood returned by the above distribution. By default, we should accept anything within 2 standard deviations of the mean, i.e. 95% of the samples, and issue a warning message if the candidate is outside of the 95% range. Hence, we need to
-
Firstly, run univariate validations and issue a warning if a candidate is outside of 95% range -
Secondly, run bivariate validation and issue a warning if a candidate is outside of 95% range -
Thridly, run multi-variate validation and issue a warning if a candidate is outside of 95% range
Validation Conditions
Validation Level
-
Warning: this might be an error, but we will still allow the upload to be validated. -
Error: this is an error and must be fixed to validate the upload.
Example
Input Data
Output error/warning message
Additional Notes
Edited by Qingling Wu