Commit ea570b61 authored by Stephan Artmann's avatar Stephan Artmann

Updated README.

parent f945f2f5
![](demo/map_img.png)
![](video/map_img.png)
## Introduction
......@@ -47,18 +47,24 @@ A household with e.g. a PiN score of 6 in the Wash sector would have a PiN index
## Data Cleaning
tbd
In order to include the additional data found on the WorldPop Project website, it is necessary to place the .tif files in
```data/external/literacy/```, ```data/external/population/``` and ```data/external/poverty/```.
To generate a 'clean' data table with demographic variables, PiNs and other variables used in our analysis, run the following code from the root directory:
```python
import src.cleanup.cleanup as clp
clean = clp.clean_table() #generates a new cleaned up table class instance (WARNING: used pre-computed data, see note below!)
clean.generate_PiN() #calculate PiN from scratch (only necessary if PiN-data have changed)
clean.table #machine readable table
clean.human #human readable table
clean.translations #table matching human readible strings to machine readible strings
clean.one_hot #one-hot encoded table
```
**Note:** This code assumes that the xls-Tables have not changed and loads pre-computed cleaned tables. If you work with new xls-Tables, or if they have changed, change the second line to
```clean = clp.clean_table(reload_tables=True)```
## Predictions
The folders containing the scripts of the logistic regression and the random forest are located in ```src/```.
Each model has its own script.
The commands required to train/predict a model can be found in the first lines of the corresponding model script.
The input features as well as the training/testing mode can be adjusted in the main function of the scripts.
tbd
## Visualisations
......@@ -97,4 +103,4 @@ The input features as well as the training/testing mode can be adjusted in the m
_**Note**: We have specified a 'mode' flag, which lets you choose whether you would like to show the predictions and PiN index or the absolute PiN scores directly.
We have taken this approach because it is not possible to create a general heatmap for two very different scales.
Thus, in mode 'prediction' the map will show you the ground truth and predictions of the percentage of people in need in the respective sector,
whereas mode 'pin_visualisation' will show you only the ground truth of absolute PiN values._
\ No newline at end of file
whereas mode 'pin_visualisation' will show you only the ground truth of absolute PiN values._
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment