README for html2rss
html2rss is a python3 script to scrap the homepage of a news site and generate a RSS 2.0.1 feed.
This script is currently set to scrap a couple of specific news sites (AIChE ChEnected and EASME (the European Commission's Executive Agency for Small and Medium-sized Enterprises)), but it should be easy to adapt it to other news sites ...
For more information, see the blog post RSS feed for AIChE ChEnected.
Homepage: https://gitlab.com/simevo/html2rss.
Usage
Launch the script from the command line, passing as argument the identifier of the news site you want to scrape (currently either chenected
or easme
).
Sample invocation:
./html2rss.py chenected
Output:
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>AIChE ChEnected</title>
<link>http://libpf.com/chenected2.rss</link>
<description>Hourly scrapped for you from http://www.aiche.org/chenected</description>
<language>en</language>
<lastBuildDate>Fri, 18 Jun 2021 00:00:00 -0000</lastBuildDate>
<item>
<title>Juneteenth and the IDEAL Path Forward</title>
<link>http://www.aiche.org/chenected/2021/06/juneteenth-and-ideal-path-forward</link>
<description>AIChE observes African American Emancipation Day — Juneteenth — and encourages everyone in our community to commemorate the day through reflection, self-assessment, and learning.</description>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Societal Impact Operating Council (SIOC)</dc:creator>
<pubDate>Fri, 18 Jun 2021 00:00:00 -0000</pubDate>
</item>
<item>
<title>Alan Bahl: Featured LGBTQ+ ChemE Professional</title>
<link>http://www.aiche.org/chenected/2021/06/alan-bahl-featured-lgbtq-cheme-professional</link>
<description>Meet Alan Bahl who discusses his experience as an LGBTQ+ ChemE professional working in EHS for the MBCC Group, a chemical manufacturer.</description>
<dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">John Vasko</dc:creator>
<pubDate>Thu, 17 Jun 2021 00:00:00 -0000</pubDate>
</item>
...
</channel>
</rss>
License
(C) Copyright 2015-2021 Paolo Greppi simevo.com - All rights reserved.
This program may be used under the terms of the GNU General Public License version 3.0 as published by the Free Software Foundation and appearing in the file LICENSE.txt included in this repository.