Netflix CSV to TMDB
Description
In comes NetflixViewingHistory.csv and out come a JSON file of every show's TMDB and IMDB links
JSON Output Structure
-
key: "alphabetical_ordered_titles"
- key: list position
- key: 0
- value: full title
- key: 1
- value: date watched
- key: 0
- key: list position
-
key: "grouped_entries"
- key: show name
- key: "TMDB_ID"
- valueL ID from TMDB
- key: "episodes"
- key: list position
- value: episode name
- key: list position
- key: "TMDB_ID"
- key: show name
-
key: "single_entries"
- key: title
- key: "TMDB_ID"
- value: ID from TMDB
- key: "TMDB_ID"
- key: title
-
key: error_entries
- key: list position
- value: full name of entry
- key: list position
Important Links
- https://developers.themoviedb.org/3/getting-started/introduction
- https://search-engine-parser.readthedocs.io/en/latest/
TODO
- Switch to using sqlite and sqlalchemy rather than JSON
- Return simple analytics
- Most episodes of series watched
- List of by most watched episodes/movies
- Get total number of single_entries
- Get total number of grouped_entries
- Getting TMDB ID
- Find other ways to search for the content rather than centralized search engines
- Research alternative API's https://rapidapi.com/collection/netflix-api-alternatives
- Choose a bunch of example cases and put them through the search algorithm and document what happens
- Once TMDB ID is known
- Total estimated time spent watching each show, need TMDB or IMDB API get get lengths of shows / movies
- Total time watched
Edge Conditions
- Haikyu!!, first season is not labeled as a season and just the episode names show up
- Spider Man, specifically has to look for show
- Pokemon The Series
- The Killing, specifically has to search for show
- X-Men, specifically has to search for show
- Big Mouth, there are results for a movie of the same name
- Japan Sinks: 2020: Season 1: Resurrection, has a collin
- Avatar: The Last Air Bender, can be confused with Avatar movie from James Cameron
- 21 Jump Street, returns the show
Design Decisions
Should the algorithm be pure just querying the TMDB database or take the easy route of going through search engines. There can easily be 1000 searches to get an idea on exactly how many shows and or movies an individual has watched. Google already banned the test script but one can also use DuckDuckGo, and Bing and Google. There is also the fact that Google Instant exists. Search Engines it is.
Errors
ENGINE FAILURE: Google
Traceback (most recent call last):
File "main.py", line 14, in <module>
search_result = search_to_tmdb(entry + " " + search_term)
File "/home/dentropy/Projects/NetflixCSVtoTMDB/modules/use_search_engine.py", line 8, in search_to_tmdb
gresults = gsearch.search(*search_args)
File "/home/dentropy/.local/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 266, in search
return self.get_results(soup, **kwargs)
File "/home/dentropy/.local/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 234, in get_results
raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic