README.md 761 Bytes
Newer Older
1 2 3 4
# ludoj #

Scraping data about board games from the web.

Markus Shepherd's avatar
Markus Shepherd committed
5
## Scraped websites ##
6

Markus Shepherd's avatar
Markus Shepherd committed
7 8 9
* [BoardGameGeek](https://boardgamegeek.com/) (`bgg`)
* [luding.org](http://luding.org/) (`luding`)
* [spielen.de](http://gesellschaftsspiele.spielen.de/) (`spielen`)
10

Markus Shepherd's avatar
Markus Shepherd committed
11
## Run scrapers ##
12

Markus Shepherd's avatar
Markus Shepherd committed
13
Requires Python 3. Make sure your (virtual) environment is up-to-date:
14

Markus Shepherd's avatar
Markus Shepherd committed
15 16 17
```bash
pip install -Ur requirements.txt
```
18

Markus Shepherd's avatar
Markus Shepherd committed
19
Run a spider like so:
20

Markus Shepherd's avatar
Markus Shepherd committed
21
```bash
22
scrapy crawl <spider> -o 'feeds/%(name)s/%(time)s/%(class)s.csv'
Markus Shepherd's avatar
Markus Shepherd committed
23
```
24

Markus Shepherd's avatar
Markus Shepherd committed
25 26
where `<spider>` is one of the IDs above.

27
You can run `scrapy check` to perform contract tests for all spiders, or
Markus Shepherd's avatar
Markus Shepherd committed
28 29 30
`scrapy check <spider>` to test one particular spider. If tests fails,
there most likely has been some change on the website and the spider needs
updating.