_posts/2010-10-15-repository-vapourware.html · 55890a1718 · F-Droid / Website

import all news posts as HTML jekyll posts (closes #19 ) · eb928047

Hans-Christoph Steiner authored Jan 23, 2017

This is done using HTML since the original source is in HTML. This
does not move the image locations, it leaves the <img> tags as is, so
it gets them from the wordpress locations.

Since only @CiaranG has access to the Wordpress database, I didn't use any
of the import methods. They all require direct database access.  Instead, I
used a little bag of tricks:

* wget --span-hosts --recursive --page-requisites --html-extension \
  --convert-links --include-directories=/posts,/news-and-reviews \
  https://f-droid.org/news-and-reviews/
* and this python script:

import glob
import os
import bs4

for f in glob.glob('posts/*/index.html'):
    print('parsing', f)
    outputname = os.path.basename(os.path.dirname(f)) + '.html'
    body = '---\nlayout: post\n'
    with open(f) as fp:
        soup = bs4.BeautifulSoup(fp)

        title = soup.find('title')
        if title:
            body += 'title: "' + title.text.replace(' – F-Droid', '')

        author = soup.find('a', {'class', 'url'})
        if author:
            body += '"\nauthor: "' + author.text + '"\n---\n\n'

        post_entry = soup.find('div', {'class', 'post-entry'})
        if post_entry:
            body += str(post_entry)

        date = soup.find('time', {'class', 'updated'})
        if date:
            filedate = date['datetime'].split('T')[0]
    with open(os.path.join('output', filedate + '-' + outputname), 'w') as fp:
        fp.write(body)

eb928047