Skip to content

feature : Use Prefect

This merge request handles the following :

#22 Use Prefect

  • install prefect via pyinfra
  • make a dev version with docker / docker compose
  • modify scrap/*.py to make them work with prefect

Prefect deployment should not be in the project it self, should be handled separately or we could add a CI job to do it, if we really want to keep that with the project

Dockerfile and docker-compose.yml are for local dev only, -> new issue : "Fix: make Dockerfile viable for production"

For now the usage of prefect is pretty basic, we should at some point add monitoring and alerting and perhaps multiple flows for various data (as requested in #17 and #18)

More generally this projects could use a bit more documentation (for something this size, an update of the readme and a few docstrings should do the trick)

#19 Feature: Create Docker compose file to run clickhouse locally

  • Create Docker compose file to run clickhouse locally The resolve of this issue is a result of #22

#20 Set xp() function as a global function Estimated 2h :

  • create tools.py
  • It should handle playwright exceptions and sanitization and data validation
  • def xp(page, xpath) -> str
  • def sanitize_price(str) -> int?
  • def sanitize_bandwidth(str) -> str
  • integration and unit testing with mockupdata.

#16 Add bouygues

  • create bouygues.py
  • This should be fearly similar to sosh.py and orange.py
  • At this point we should maybe consider refactor a bit the way we scrap, if we want to keep adding more competitors to the list.
  • We might also want to start looking at libs like woob and/or scrapy

Merge request reports

Loading