Skip to content

Scraping scheduler, testing and DB, Cache integration

This is the final PR for the server side.

to test go to server/Fake News copy this folder and perform the following steps:

  1. install and activate virtualenv

  2. install all required libraries:

    pip install -r requirements.txt

  3. also install nltk files, open python terminal:

     1. import nltk
     2. nltk.download('punkt')
     3. nltk.download('wordnet')
     4. nltk.download('stopwords')
  4. install postgres for DB and cache

    sudo apt install postgresql postgresql-contrib

  5. create a db

    sudo -u postgres createdb fakeNewsDB

    if there is error and required to create user:

     ` sudo -u postgres createuser "usernameOfUrPC"`
  6. to install chrome driver for news web scrapping(if not installed already)

    sudo apt-get install chromium-chromedriver

    if there is path error:

    #Adding the path to the selenium line:

    driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver")

  7. to create all table and handle migration. Do this once

    python API_manager.py db init
    
    python API_manager.py db migrate
    
    python API_manager.py db upgrade
    
    python API_manager.py runserver
    
  8. to test if data is committed to DB properly to test if data is committed to DB properly or not

    1. login to DB:

      sudo -u postgres psql

    2. open DB: list all tables present

      \c fakeNewsDB

    3. to open a realtion/table:

      SELECT * FROM "table_name"

Tables after testing:

DB_testing

API request struct for

  1. Facebook Request fb_req

  2. Twitter Request twitter_req

  3. web news Request webNews_req

Edited by AMARDEEP KUMAR

Merge request reports