Projects with this topic
-
Postgres DB + Crawlers and Scrapers for Apache HOP
Updated -
-
modulargrid-based scraper of some eurorack resellers
Updated -
Messor is the data collection module responsible for scraping news articles from various sources.
Key Features: Multi-source scraping (URLs, local files, shared files) Parallel processing using ThreadPoolExecutor Session-based tracking of scraping statistics Duplicate detection and filtering Language validation Technical Implementation: Built on newspaper3k and BeautifulSoup for article extraction Modular architecture with clear separation of concerns FastAPI-based REST API for integration YAML-based configuration system Data Flow: Retrieves news outlet sources from configuration Validates outlet URLs Builds newspaper objects for each source Processes articles in parallel Saves articles to staging storage Optionally moves files to cloud storage (Digital Ocean Spaces)Updated -
Since avherald.com does not have a RESTful API, RSS Feed, or other ways to get data without visiting the website, this Python 3 script will extract the data for you using website scraping.
Updated -
A bash script that automatically downloads the latest news videos from Nordic public broadcasters for convenient viewing on a kitchen TV or similar setup.
Updated -
Tool to get and list favorite publications from rule34.xxx page in a SCV file
Updated -
-
این پروژه یک API مبتنی بر جنگو و Django REST Framework است که امکان دریافت لیست اخبار را فراهم میکند. در این API، اخبار شامل عنوان، متن، تگها و منبع میباشند. همچنین قابلیت فیلتر کردن اخبار بر اساس تگها، کلیدواژههای موجود و کلیدواژههای حذف شده فراهم شده است. این پروژه شامل طراحی مدلهای دیتابیس و نوشتن تستهای واحد برای اطمینان از عملکرد صحیح است.
Updated -
Zoomit.ir news scraper using Scrapy | Scrapy جمعآوری اخبار زومیت با
Updated -
Client application for Belarusian dictionary websites
Updated -
This is simple Python email scraper that finds and writes into file all emails found on provided websites. You can also speify "deepness" of search, time-out and much more. Check program for full manual.
Updated -
Aggregator and frontend for news.ycombinator.com
Updated -
ccvertor - a hobby project
Updated -
Bug fixes for the abandoned python Wikipedia project to warn the user when the Wikpedia suggestion engine is corrupting the titles of valid Wikipedia articles. Required for the examples in Natural Language Processing in Action, 2nd Edition by Maria Dyshel and Hobson Lane (and a community of more than 30 contributing authors and editors).
Updated -
A versatile tool for making HTTP requests and scraping web content
Updated -
Spremioloon.Com data scraper
Updated -
This repo is a mix of several data science tools. There is a mix of web-scraping of data that is then cleaned, and used to analyze the property market in malta, using prediction models, visualisations and statistical analysis.
There is also visualisations for chess data from the 1980's till 2021. Moreover, there is twitter data, which is then stored in the neo4J nosql dbms.
No data is presented in the git, only the results. Code with the data can be found at: https://drive.google.com/file/d/15EQnRtsngDsFDD_A7g4N1fwCuXI0f_Xi/view?usp=sharing
UpdatedUpdated -
An assignment tracker for LEB2 for more manageable
Updated -
Este proyecto esta realizado para obtener el precio del dolar en venezuela, escaneando la pagina del Banco Central de Venezuela http://www.bcv.org.ve/
UpdatedUpdated