Commit 6ff9da57 authored by ViktoriaO1's avatar ViktoriaO1

Merge branch 'modified_README' into 'master'

Modified readme

See merge request analytics-club/hack4good/fs19/team-2!19
parents dea3fef8 15de0b60
Welcome to the GitLab Repo for the Pilot edition of Hack4Good this FS19.
The repo will be made public and liscenced under your name with an open-source
liscence after the end of the program, such that other NGOs and the Humanitarian
sector may benefit from your great work! That being said, enjoy the ride and use
your skills to make this world a better place!
## Introduction
Your Hack4Good team (:
**Welcome to the GitLab repo of Team Green!**
**Useful Links:**
* [H4G-Pilot Edition Drive]( (for scheduling calls and asking questions to our NGO partners)
* [Impact Initiatives website](
This repo was created in the context of the Hack4Good competition, which paired an NGO, IMPACT, with ETH students to gain more insights into their recently collected, cross-sectional dataset.
In 2018, IMPACT reached out to households in Nigeria to answer a questionnaire detailing their living situation and needs in different aspects of life, the Multi-Sectorial-Needs-Analysis (MSNA).
In this work, we outline the efforts of team green to analyze the questionnaire data, which were two-fold.
First, to help identify households in need based on (easily obtainable) demographic data rather than expensive and time-consuming questionnaires.
Second, to provide a framework for the interactive visualization of the given data on a map.
**Workshop Dates:**
| Event | Date | Location |
| ------ | ------ | ------ |
| Teamwork time in the SPH (optional) | 24.4.19 (Wed) , 17:00 - 19:00| SPH |
| Workshop 1: Agile workshop for Data Science by external experts | 26.4.19 (Fri), 16:30 - 20:30 | HG E.42 |
| Teamwork time in the SPH (optional) | 29.4.19 (Mon), 17:00 - 19:00 | SPH |
| Workshop 2: Feedback workshop | 8.5.19 (Wed), 17:00 - 19:00 | SPH |
| Workshop 3: Pitching workshop | 15.5.19 (Wed), 17:00 - 19:00 | SPH |
| Final event: Final presentation & Workshop 4: Reflection | 20.5.19 (Mon), 17:30 - 21:30 | SPH |
**Folder Structure**
We have already created a folder structure that should help you starting right away. It should be seen as a guideline and shall help us us
to easier navigate through your code. All present code is exemplatory and you don't have to use any of it. Feel free to delete the existing notebooks as well as the code in src.
├── <- The top-level README for developers using this project
├── environment.yml <- Python environment
├── data
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── misc <- miscellaneous
├── notebooks <- Jupyter notebooks. Every developper has its own folder for exploratory
│ ├── name notebooks. Usually every model has its own notebook where models are
│ │ └── exploration.ipynb tested and optimized. (The present notebooks can be deleted as they only serve for inspiration purposes)
│ └── model
│ └── model_exploration.ipynb <- different optimized models can be compared here if preferred
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
├── results
│   ├── outputs
│   └── models <- Trained and serialized models, model predictions, or model summaries
│ (if present)
├── scores <- Cross validation scores are saved here. (Automatically generated)
│ └── model_name <- every model has its own folder.
├── src <- Source code of this project. All final code comes here (Notebooks are thought for exploration)
│ ├── <- Makes src a Python module
│ ├── <- main file, that can be called.
│ │
│ │
│ └── utils <- Scripts to create exploratory and results oriented visualizations
│ └── / functions to evaluate models
│ └── There is an exemplary implementation of these function in the sample notebook and they should be seen
as a help if you wish to use them. You can completely ignore or delete both files.
**How to use a python environment**
The purpose of virtual environments is to ensure that every developper has an identical python installation such that conflicts
due to different versions can be minimalized.
Open a console and move to the folder where your environment file is stored.
* create a python env based on a list of packages from environment.yml
```conda env create -f environment.yml -n env_your_proj```
* update a python env based on a list of packages from environment.yml
```conda env update -f environment.yml -n env__your_proj```
We hope you will be able to build on our experiences and continue with the prediction and visualisation efforts :)
* activate the env
All the best,
```activate env_your_proj```
* in case of an issue clean all the cache in conda
Your Team Green:
```conda clean -a -y```
_Nico Messikommer_,
_Francesco Saltarelli_,
_Stephan Artmann_,
_Viktoria de La Rochefoucauld_
* delete the env to recreate it when too many changes are done
```conda env remove -n env_your_proj```
**Git Branching**
The main reason to work with branches is to ensure one branch (master) will always work.
If you add new code, a new branch should be created. Once the new code works, the created branch can be merged into the master branch.
In the following is a brief overview of the required commands:
## Useful Links
* First create a new branch locally in the folder, where ".git" folder is located.
* [Impact Initiatives website](
```git checkout -b name_branch ```
* After you are finished and your code is running, you can commit and push to the repo:
```git add -A```
## Terminology Clarification
In the following, we will mention PiN scores versus PiN index. These two terms are not the same thing.
With the PiN score (scale of 0 - 10 per sector), we mean the absolute calculated PiN value according to the MSNA report provided by IMPACT.
In order to execute our predictions, however, we had to set a threshold for when a household is 'in need' versus when it is not 'in need'.
This threshold lies at a PiN score of 4. Above, a household is considered to be 'in need'
Therefore, the PiN index will tell us solely about whether or not a household is in need or not.
For example:
A household with e.g. a PiN score of 6 in the Wash sector would have a PiN index of 1 for this sector.
## Data Cleaning
## Predictions
## Visualisations
If you want to run the visualisations, we recommend you to create a new environment from scratch, which is not used for the predictions,
as the different dependencies (mainly the geographical libraries such as Geopandas, Fiona, Shapely etc.) are quite
delicate and on our machines only worked when all installed via conda together.
So please create the environment in the following way:
* navigate to the 'visualisations' folder
* create a python env based on the list of packages from environment.yml within the folder
```conda env create -f environment.yml -n GeoEnv```
* activate the env
```activate GeoEnv```
```git commit -m "Description of addition" ```
```git push -u origin name_branch```
* The new branch can now be merged on the gitlab page of the repo by creating a new "Merge Request".
#### Now choose which version you would like to run
You have the choice to run either the version that is suitable for your favourite **code editor**, or the one that you can
launch directly from your **jupyter notebook** workspace.
Both versions require the same files:
* You can now delete the created branch on the gitlab page and locally.
\ No newline at end of file
1. The geometry file of the wards in Nigeria (wards_geometry.pickle)
_This is a processed version that you can generate yourself using the commented out section of code at the beginning of both files_
2. The predictions file (PiN_test_pred.csv), which contains information about the PiN score, the PiN index and the prediction results.
_Although we have also provided you with this file per default, you can generate it yourself using the prediction scripts._
If you launch the **.py script**, then you will generate a map that is a HTML file, which you can interact with and also send to your peers.
If you launch the **.ipynb script**, then you will generate a map inside of the jupyter notebook environment.
_**Note**: We have specified a 'mode' flag, which lets you choose whether you would like to show the predictions and PiN index or the absolute PiN scores directly.
We have taken this approach because it is not possible to create a general heatmap for two very different scales.
Thus, in mode 'prediction' the map will show you the ground truth and predictions of the percentage of people in need in the respective sector,
whereas mode 'pin_visualisation' will show you only the ground truth of absolute PiN values._
\ No newline at end of file
......@@ -6,20 +6,20 @@
# ----------------------- RAW DATA ------------------------- #
initial_sample: '../../Data/raw/reach_nga_msna_initial_sample.xlsx'
raw_dataset: '../../Data/raw/reach_nga_msna_clean_dataset_final.xlsx'
initial_sample: '../../data/raw/reach_nga_msna_initial_sample.xlsx'
raw_dataset: '../../data/raw/reach_nga_msna_clean_dataset_final.xlsx'
# ------------------- GEOGRAPHICAL DATA ---------------------- #
nigeria_map_path_states : '../../Data/GeoFiles/nga_admbnda_adm1_osgof_20190417.shp'
nigeria_map_path_LGA: '../../Data/GeoFiles/nga_admbnda_adm2_osgof_20190417.shp'
nigeria_map_path_wards : '../../Data/GeoFiles/nga_admbnda_adm3_osgof_eha_20190417.shp'
nigeria_map : '../../Data/NGA_adm/NGA_adm1.shp'
nigeria_map_path_states : '../../data/GeoFiles/nga_admbnda_adm1_osgof_20190417.shp'
nigeria_map_path_LGA: '../../data/GeoFiles/nga_admbnda_adm2_osgof_20190417.shp'
nigeria_map_path_wards : '../../data/GeoFiles/nga_admbnda_adm3_osgof_eha_20190417.shp'
nigeria_map : '../../data/NGA_adm/NGA_adm1.shp'
# --------------------- PROCESSED DATA ------------------------ #
human: '../../Data/processed/human.pickle'
wards_lonlat : '../../Data/processed/wards_lonlat.pickle'
wards_geometry: '../../Data/processed/wards_geometry.pickle'
predictions: '../../Data/processed/PiN_test_pred.csv'
human: '../../data/processed/human.pickle'
wards_lonlat : '../../data/processed/wards_lonlat.pickle'
wards_geometry: '../../data/processed/wards_geometry.pickle'
predictions: '../../data/processed/PiN_test_pred.csv'
# -------------------- VISUALISATION MODE ---------------------- #
# possible modes: 'pin_visualisation' and 'prediction'
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment