Commit 53ab5ea5 authored by Nathan's avatar Nathan 🚴

initial grouped commit

parent 3e344391
.ipynb_checkpoints
FARS_data/data/
python_trainings/flight_data
python_trainings/*.csv
python_trainings/*.xlsx
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Once you've installed your Anaconda Python, I recommend setting up a virtual environment (See the bullet for Virtual environments)
Here's what that will look like in your terminal :
[email protected]~/some/location: $ conda create --name <name_of_your_environment> python=3
This will create a separate python repository that will keep your Anaconda distribution separate from any packages you need for this project / environment. It's a great way to avoid the "well it works on my machine" problem.
Once you've created your virtual environment (venv), you'll need to activate it by typing :
source activate <name_of_your_environment>
This will activate that specific version of python and the packages contained in it for your use. If it's a brand new venv, you'll probably not have some of the things you'll need installed like pandas. I've included a file called requirements.txt that you can use to get the packages we used for our trainings.
Type this in your terminal (in the same directory where the requirements.txt file lives, and with your venv activated) :
pip install -r requirements.txt
Once that's finished you should be ready to roll. To get Jupyter to run in your browser, try typing :
jupyter notebook
and it should open up a window on your browswer or at least give you a link to paste in.
To run your script.py file, simply type :
python script.py
This will execute everything under the if __name__ == '__main__': line.
This diff is collapsed.
numpy==1.12.1
pandas==0.20.1
seaborn==0.7.1
matplotlib==2.0.2
xlrd==1.0.0
fuzzywuzzy==0.15.0
jupyter==1.0.0
\ No newline at end of file
#!/usr/bin/env python
# -*- coding: utf: 8 -*-
"""
Reads in an Excel file returns a pivoted analysis.
See https://www.python-boilerplate.com/py3+executable/ for how to make alterations!
"""
import pandas as pd
import numpy as np
import sys
def make_catagorical_vars(x):
if x < 205000:
return "SMALL"
elif x <= 250000:
return "MEDIUM"
elif x <= 311000:
return "BIG"
else:
return "HUGE"
def format_as_money(x):
return "${:,.0f}".format(x)
if __name__ == "__main__":
# Read in our data
df = pd.read_excel('excel_data.xlsx')
# For inspection
print(df.head())
# Run whatever calculations
df['Total'] = df['Jan'] + df['Feb'] + df['Mar']
df['Sizes'] = df['Total'].apply(make_catagorical_vars)
grouped_df = df[['Total', 'Sizes']].groupby('Sizes').sum()
# Melt and merge
long_df = pd.melt(df, id_vars=['account'], value_vars=['Jan', 'Feb', 'Mar'])
# Merge with our original df (we don't use it after this, but it's cool to see how it works)
merged_df = pd.merge(df, long_df, on='account')
# Reshape our long format into wide again
temp_df = pd.pivot_table(long_df, index=['account'], columns='variable')
temp_df.columns = temp_df.columns.droplevel().rename(None)
temp_df.reset_index(inplace=True)
# Format as money
formatted_df = temp_df.applymap(format_as_money)
# write out analyzed file
formatted_df.to_csv('formatted.csv', index=False)
print("we did it!")
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment