Table of content:
Estimates calculation
This part of nFIESTA implements the estimators described by adolt_et_al_2018.
The selection and preparation of data for the particular estimate is done by a set of SQL functions.
These are also used to prepare auxiliary data and parametrise the working model whenever a modified direct estimator is specified in the estimates configuration.
At certain stage the SQL functions hand over to a PL/Python function which calculates the matrix inversion needed to get \boldsymbol{\tilde{G}_{\beta_{t+}}}
matrix.
Once all data for the estimation has been prepared other SQL functions calculate the desired estimates of total or ratio (depending on what has been configured) while another PostgreSQL extension called htc is called to evaluate the respective variance estimate according to the Horvitz-Thompson theorem for infinite populations cordy_1993. This extension also evaluates the pairwise inclusion densities following the specification given in technical report by adolt_et_al_2018 (p. 6).
For sake of calculation speed the core part of the htc extension has been written in C but there is an SQL API (Application Interface) making the integration with PostgreSQL possible. The htc extension is automatically installed during the standardised installation of nFIESTA so the user in fact does not have to interact with it directly.
Thanks to the database (SQL) nature of the nFIESTA implementation many target parameters can be evaluated in one single run - for a large number of estimation cells and, if requested, also for many alternative parametrisation regions and working models. The calculations can be distributed to more processor cores or even physical machines so the results are obtained in a fraction of the time, which would be normally needed. Unfortunately, the spread of the calculation load to more cores or machines has to be steered manually at the moment.
The nFIESTA performance was tested within the T2.3.1 case study lanz_et_al_2018a. With a PostgreSQL (version 11) and nFIESTA installation on a vmware virtual machine (64bit Windows 7 OS, 2x Intel Xeon 2.00GHz / i.e. 4 cores in total, 8GB RAM, 127 GB SSD disk) 36990 estimates generated by combinations of:
- seven target parameters (total biomass, total coniferous biomass, total broadleaved biomass, total area of biomass domain, mean total, mean coniferous and mean broadleaved biomass per hectare of the biomass domain)
- the 50 by 50 km and 100 by 100 km estimations cells (Inspire grid) covering (or intersecting) the whole territories of Czech Republic, France, Germany and Switzerland, plus one separate cell corresponding to the whole territory of Czech Republic (to test htc extension performance for larger estimation cells)
- 100 by 100 km parametrisation regions plus one separate parametrisation region corresponding to the whole territory of the Czech Republic
- six alternative working models (defined in terms of variables generated from the Copernicus Forest Type and Tree Cover density maps)
were calculated using three parallel PostgreSQL connections (three cores of one physical machine) in less than six hours.
Out of this time three hours were spent with the working model parametrisation (calculation of \boldsymbol{\tilde{G}_{\beta_{t+}}}
matrices), one hour took the computation of total estimates and one hour and twenty minutes took the computation of ratios (including the calculation of the respective variances by htc extension).