Commit 637e2bde authored by Danny Robson's avatar Danny Robson
Browse files

Initial import

== git2mkdocs
Some scripts that convert a Dokuwiki data directory into mkdocs format.
=== Requirements
* pandoc>=2.2
=== Running
The `` script will translate one file from 'Dokuwiki' syntax into 'github flavoured markdown', and apply a variety of fixups required to correct links and formatting.
A typical invocation will be of the form:
`find /foo/dokuwiki/data -name "*.txt" -exec ./ '{}' \;`
pandoc -f dokuwiki -t gfm -o "$DST" "$SRC"
# We had a tendency to use image links in Dokuwiki where we really meant an internal link. (ie, '{{ foo }}' vs '[foo]')
# This is most prominent where we link to a pdf.
# So we look for image links (of the form '![foo](bar)' and strip off the leading '!', converting them to internal links.
# It's probably fractionally easier doing this after we've converted to Markdown given it's a fairly simple regex.
sed -i -r 's/!(\[.*\]\(.*\.pdf\))/\1/' "${DST}"
# Convert all links targets to lowercase. There's a mismatching between link names in the Dokuwiki source, and the case of the backing files.
sed -i -r 's/(\[.*?\]\()(.*)(\))/\1\L\2\3/' "${DST}"
# Remove ampersands from internal wiki links. They _appear_ to have been stripped somewhere before Dokuwiki stored the files.
sed -i -r 's/(\[.*\]\(.*)\&(.*\))/\1\2/' "${DST}"
# Convert double spaces in link targets to single spaces. This collapses double spaces that arise when we remove an ampersand from "foo & bar" (as it becomes "foo bar").
# It's a bit of a hack, but it seems to work for our data.
sed -i -r 's/(\[.*\]\(.*)%20(%20.*\))/\1\2/' "${DST}"
# Convert spaces in link targets to underscores. This conforms with the backing files from Dokuwiki
sed -i -r -e :a -e 's/(\[.*\]\(.*)%20(.*\))/\1_\2/; ta' "${DST}"
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment