Acquire, process, store play-by-play data to do RAPM regression

Two main options here:

A) Buy it from bigdataball, with a .edu address it comes out to about $71 for 13 years of data. The major value add here is that it is already formatted to do the RAPM regression, whereas data we scrape ourselves would not be. Though if we want to use current data we'd need to subscribe or keep buying.

B) Scrape for it ourselves from ESPN / bball-ref

  • After scraping, we'd need to come up with an algorithm that parses play by play data to figure out which 5 players are on the floor at any given time (people behind bigdataball came up with an algorithm for this and that's why their dataset has it)

  • This is likely to be a lot more work to start (and coming up with the algorithm seems kinda interesting) and we could use it on current season's data once developed

Edited by Simon Zou