assets/mvdan.jpg · f7368df5a26830aa514f31a1173fa352fd47dd7b · F-Droid / Website

import all news posts as HTML jekyll posts (closes #19 ) · eb928047

Hans-Christoph Steiner authored Jan 23, 2017

This is done using HTML since the original source is in HTML. This
does not move the image locations, it leaves the <img> tags as is, so
it gets them from the wordpress locations.

Since only @CiaranG has access to the Wordpress database, I didn't use any
of the import methods. They all require direct database access.  Instead, I
used a little bag of tricks:

* wget --span-hosts --recursive --page-requisites --html-extension \
  --convert-links --include-directories=/posts,/news-and-reviews \
  https://f-droid.org/news-and-reviews/
* and this python script:

import glob
import os
import bs4

for f in glob.glob('posts/*/index.html'):
    print('parsing', f)
    outputname = os.path.basename(os.path.dirname(f)) + '.html'
    body = '---\nlayout: post\n'
    with open(f) as fp:
        soup = bs4.BeautifulSoup(fp)

        title = soup.find('title')
        if title:
            body += 'title: "' + title.text.replace(' – F-Droid', '')

        author...

eb928047