-
Hans-Christoph Steiner authored
This is done using HTML since the original source is in HTML. This does not move the image locations, it leaves the <img> tags as is, so it gets them from the wordpress locations. Since only @CiaranG has access to the Wordpress database, I didn't use any of the import methods. They all require direct database access. Instead, I used a little bag of tricks: * wget --span-hosts --recursive --page-requisites --html-extension \ --convert-links --include-directories=/posts,/news-and-reviews \ https://f-droid.org/news-and-reviews/ * and this python script: import glob import os import bs4 for f in glob.glob('posts/*/index.html'): print('parsing', f) outputname = os.path.basename(os.path.dirname(f)) + '.html' body = '---\nlayout: post\n' with open(f) as fp: soup = bs4.BeautifulSoup(fp) title = soup.find('title') if title: body += 'title: "' + title.text.replace(' – F-Droid', '') author = soup.find('a', {'class', 'url'}) if author: body += '"\nauthor: "' + author.text + '"\n---\n\n' post_entry = soup.find('div', {'class', 'post-entry'}) if post_entry: body += str(post_entry) date = soup.find('time', {'class', 'updated'}) if date: filedate = date['datetime'].split('T')[0] with open(os.path.join('output', filedate + '-' + outputname), 'w') as fp: fp.write(body)
eb928047