Refactor the initial codebase
So far, I worked on the Crawler prioritising development speed over code maintainability and readability, especially when considering the older sections of the codebase. This was a bad idea, but it helped me to release the site before the beginning of Summer 2019.
I believe it's time to refactor the existing Python code so that I can more easily add new features and find bugs. Specifically, I'm planning to:
-
rewrite parts of the code following a more declarative / less imperative approach while also simplifying some hard-to-understand functions (DataAnalyzer.py, I'm looking at you! (seriously, what does [e.load_name(connection) or e.save_picture(session) for e in element[1]]
even mean?)); -
add multiple execution modes that can be passed via command line; -
rewrite the Crawler
class from zero, perhaps by splitting it into two or more files; -
make the boto3
dependency optional; -
remove all useless or redundant log messages (DbManager.py alone produced around 500 MiB of logs in just a few months); -
improve or fix some other things.
Edited by ZeroCrystal