Skip to content
Snippets Groups Projects
Select Git revision
  • 2025Q2
  • main default
  • 2025Q1
  • 2024Q1
  • 2024Q4
  • 2024Q3
  • 2024Q2
  • 2023Q4
  • 2023Q3
  • 2023Q2
  • 2023Q1
  • 2022Q4
  • 2022Q3
  • 2022Q2
  • 2022Q1
  • 2021Q4
  • 2021Q3
  • 2021Q2
  • 2021Q1
  • 2020Q4
  • 14.1-eol
  • release/13.5.0
  • 13.3-eol
  • release/14.2.0
  • 14.0-eol
  • release/13.4.0
  • release/14.1.0
  • release/13.3.0
  • 12-eol
  • 12.4-eol
  • release/14.0.0
  • 13.1-eol-q
  • 13.1-eol
  • 12.3-last
  • 12.3-eol
  • release/13.2.0
  • release/12.4.0
  • release/13.1.0
  • release/12.3.0
  • 11-eol
40 results

namazu

  • Clone with SSH
  • Clone with HTTPS
  • Code owners
    Assign users and groups as approvers for specific file changes. Learn more.

    Netflix CSV to TMDB

    Description

    In comes NetflixViewingHistory.csv and out come a JSON file of every show's TMDB and IMDB links

    JSON Output Structure

    • key: "alphabetical_ordered_titles"

      • key: list position
        • key: 0
          • value: full title
        • key: 1
          • value: date watched
    • key: "grouped_entries"

      • key: show name
        • key: "TMDB_ID"
          • valueL ID from TMDB
        • key: "episodes"
          • key: list position
            • value: episode name
    • key: "single_entries"

      • key: title
        • key: "TMDB_ID"
          • value: ID from TMDB
    • key: error_entries

      • key: list position
        • value: full name of entry

    Important Links

    TODO

    • Switch to using sqlite and sqlalchemy rather than JSON
    • Return simple analytics
      • Most episodes of series watched
      • List of by most watched episodes/movies
      • Get total number of single_entries
      • Get total number of grouped_entries
    • Getting TMDB ID
    • Once TMDB ID is known
      • Total estimated time spent watching each show, need TMDB or IMDB API get get lengths of shows / movies
      • Total time watched

    Edge Conditions

    • Haikyu!!, first season is not labeled as a season and just the episode names show up
    • Spider Man, specifically has to look for show
    • Pokemon The Series
    • The Killing, specifically has to search for show
    • X-Men, specifically has to search for show
    • Big Mouth, there are results for a movie of the same name
    • Japan Sinks: 2020: Season 1: Resurrection, has a collin
    • Avatar: The Last Air Bender, can be confused with Avatar movie from James Cameron
    • 21 Jump Street, returns the show

    Design Decisions

    Should the algorithm be pure just querying the TMDB database or take the easy route of going through search engines. There can easily be 1000 searches to get an idea on exactly how many shows and or movies an individual has watched. Google already banned the test script but one can also use DuckDuckGo, and Bing and Google. There is also the fact that Google Instant exists. Search Engines it is.

    Errors

    ENGINE FAILURE: Google
    
    Traceback (most recent call last):
      File "main.py", line 14, in <module>
        search_result = search_to_tmdb(entry + " " + search_term)
      File "/home/dentropy/Projects/NetflixCSVtoTMDB/modules/use_search_engine.py", line 8, in search_to_tmdb
        gresults = gsearch.search(*search_args)
      File "/home/dentropy/.local/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 266, in search
        return self.get_results(soup, **kwargs)
      File "/home/dentropy/.local/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 234, in get_results
        raise NoResultsOrTrafficError(
    search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic