12.6 KB
Newer Older
#+TITLE: A blog in pure Org/Lisp
#+SUBTITLE: A pamphlet for hackable website systems
3 4 5 6
#+DATE: <2018-08-13 Mon>

* The importance of blogging

7 8
Blogs (or personal websites) are an essential piece of the the Internet as
means of sharing knowledge openly.  In particular, blogs really shine at
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

- articles [[][(before landing in access-restricting journals)]],
- projects,
- literature reference and all other sources of knowledge.  It's a good place to
  keep track of [[][web links of articles and videos]].

A [[][web feed]] (for instance RSS or Atom) is important to free the visitors from
manually checking for updates: let the updates go to them.

* Nothing but Org

The World Wide Web was devised to use HTML, which is rather painful to write
directly.  I don't want to go through that, it's too heavy a burden.  Many web
writers including me until recently use the [[][Markdown]] format.

Nonetheless, for a long time I've been wanting to write blog posts in the [[][Org]]
format.  I believe that Org is a much superior markup format for reasons that
are already well laid down by [[][Karl Voit]].  I can't help but highlight a few more
points where Org really shines:

- It has excellent math support (see my [[../homogeneous/][article on homogeneous coordinates]] for
  an example).  For an HTML output, several backends are supported including
  [[][MathJax]].  It's smart enough not to include MathJax when there is no math.  To
  top it all, there is no extra or weird syntax: it's simply raw TeX / LaTeX.

- It supports file hierarchies and updates inter-file links dynamically.  It
  also detects broken links on export.

- It has excellent support for multiple export formats, including LaTeX and PDFs.

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
* Version control system

A significant downside of printed books and documents is that they can't be
updated.  That is to say, unless you acquire a new edition, should there be any.
The Internet comes with the advantage that it allows to update content
worldwide, in a fraction of an instant.

Updating is important: originals are hardly ever typo-free; speculations might
turn out to be wrong; phrasing could prove ambiguous.  And most importantly, the
readers feedback can significantly help improve the argumentation and need be
taken into account.

The general trend around blogging seems to go in the other direction: articles
are often published and left as-is, never to be edited.

As such, many blog articles are struck by the inescapable flail of time and
technological advancement: they run out of fashion and lose much of their
original value.

But there is a motivation behind this immobility: the ability to edit removes
the guarantee that readers can access the article in its original
form.  Content could be lost in the process.  External references become
meaningless if the content has been removed or changed from the source they
refer to.

Thankfully there is a solution to this problem: version control systems.  They
keep all versions available to the world and make editing fully transparent.

I keep the source of my website at, in a public [[][Git]] repository.

I cannot stress enough the importance of [[../vcs/][keeping your projects under version
control]] in a /publicly readable repository/:

- It allows not only you but also all visitors to keep track of /all/ changes.
  This gives a /guarantee of transparency/ to your readers.

- It makes it trivial for anyone to /clone/ the repository locally: the website
  can be read offline in the Org format!

80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101
* Publishing requirements

[[][Worg has a list of blogging systems]] that work with the Org format.  Most of them
did not cut it for me however because I think a website needs to meet
important requirements:

- Full control over the URL of the published posts. :: This is a golden rule of
     the web: should I change the publishing system, I want to be able to stick
     to the same URLs or else all external references would be broken.  This is
     a big no-no and in my opinion it makes most blogging systems unacceptable.

- Top-notch Org support. :: I believe generators like Jekyll and Nikola only
     have partial Org support.

- Simple publishing pipeline. :: I want the generation process to be as simple
     as possible.  This is important for maintenance.  Should I someday switch
     host, I want to be sure that I can set up the same pipeline.

- Full control over the publishing system. :: I want maximum control over the
     generation process.  I don't want to be restricted by a non-Turing-complete
     configuration file or a dumb programming language.

102 103 104 105 106 107 108
- Ease of use. :: The process as a whole must be as immediate and friction-less
                  as possible, or else I take the risk of feeling too lazy to
                  publish new posts and update the content.

- Hackability. :: Last but not least, and this probably supersedes all other
                  requirements: /The system must be hackable/.  Lisp-based
                  systems are prime contenders in that area.
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

* Org-publish

This narrows down the possibilities to just one, if I'm not mistaken: Emacs with

- The [[][configuration]] happens in Lisp which gives me maximum control.

- Org-support is obviously optimal.

- The pipeline is as simple as it gets:
  #+BEGIN_SRC sh
  emacs --quick --script publish.el --funcall=ambrevar/publish

Org-publish comes with [[][lots of options]], including sitemap generation (here [[../][my
post list]] with anti-chronological sorting).  It supports code highlighting
through the =htmlize= package.

128 129
** Webfeeds

130 131 132 133 134 135
One thing it lacked for me however was the generation of web feeds (RSS or
Atom).  I looked at the existing possibilities in Emacs Lisp but I could not
find anything satisfying.  There is =ox-rss= in Org-contrib, but it only works
over a single Org file, which does not suit my needs of one file per blog post.
So I went ahead and implemented [[][my own generator]].

** History of changes (dates and logs)

138 139 140 141
Org-publish comes with a timestamp system that proves handy to avoid building
unchanged files twice.  It's not so useful though to retrieve the date of last
modification because a file may be rebuilt for external reasons (e.g. change in
the publishing script).

143 144
Since I use the version control system (here Git), it should be most natural to
keep track of the creation dates and last modification date of the article.

146 147
Org-publish does not provide direct support for Git, but thanks to Lisp this
feature can only be a simple hack away:

149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169
#+BEGIN_SRC elisp
(defun ambrevar/git-creation-date (file)
  "Return the first commit date of FILE.
Format is %Y-%m-%d."
    (call-process "git" nil t nil "log" "--reverse" "--date=short" "--pretty=format:%cd" file)
    (goto-char (point-min))
    (buffer-substring-no-properties (line-beginning-position) (line-end-position))))

(defun ambrevar/git-last-update-date (file)
  "Return the last commit date of FILE.
Format is %Y-%m-%d."
    (with-current-buffer standard-output
      (call-process "git" nil t nil "log" "-1" "--date=short" "--pretty=format:%cd" file))))

Then only ~org-html-format-spec~ is left to hack so that the ~%d~ and ~%C~
specifiers (used by ~org-html-postamble-format~) rely on Git instead.

See [[][my publishing script]] for the full implementation.

171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250
* Personal domain and HTTPS

I previously stressed out the importance of keeping the URL permanents.  Which
means that we should not rely on the domain offered by a hosting platform such
as [[][GitLab Pages]], since changing host implies changing domain, thus invalidating
all format post URLs.  Acquiring a domain is a necessary step.

This might turn off those looking for the cheapest option, but in fact getting
domain name comes close to 0 cost if you are not limitating yourself to just a
subset of popular options.  For a personal blog, the domain name and the
top-level domain should not matter much and can be easily adjusted to bring the
costs to a minimum.

There are many registrars to choose from.  One of the biggest, GoDaddy has [[][a
debatable reputation]].  I've opted for

With a custom domain, we also need a certificate for HTTPS.  This used to come
at a price but is now free and straightforward with [[][Let's Encrypt]].  Here is a
[[][tutorial for GitLab pages]].  (Note that the commandline tool is called [[][certbot]]

* Permanent URLs and folder organization pitfalls

[[][Chris Wellons]] has some interesting insights about the architecture of a blog.

[[][URLs are forever]], and as such a key requirement of every website is to ensure
all its URLs will remain permanent.  Thus the folder organization of the blog
has to be thought of beforehand.

- Keep the URLs human-readable and easy to remember. ::  Make them short and

- Avoid dates in URLs. :: This is a very frequent mishappen with blogs.  There
     are usually no good reason to encode the date in the URL of a post, it only
     makes it harder to remember and more prone to change when moving platform.

- Avoid hierarchies. :: Hierarchies usually don't help with the above points,
     put everything under the same folder instead.  Even if some pages belong to
     different "categories" (for instance "articles" and "projects"), this is
     only a matter of presentation on the sitemap (or the welcome page).  It
     should not influence the URLs.  When the category is left out, it's one
     thing less to remember whether the page =foo= was an article or a project.

- Place =index.html= files in dedicated folders. :: If the page extension does
     not matter (e.g. between =.html= and =.htm=), you can easily avoid the
     visitors any further guessing by storing your =foo= article in
     =foo/index.html=.  Thus browsing =https://domain.tld/foo/= will
     automatically retrieve the right document.  It's easier and shorter than

- Don't rename files. :: Think twice before naming a file: while you can later
     tweak some virtual mapping between the URL and a renamed file, it's better
     to stick to the initial names to keep the file-URL association as
     straightforward as possible.

* Other publishing systems

- [[][Frog]] is a blog generator written in [[][Racket]].  While it may be one of the best
  of its kind, it sadly does not support the Org format as of this writing.
  Some blogs generated with Frog:

- [[][Haunt]] is a blog generator written in [[][Guile]].  It seems to be very complete and
  extensible, but sadly it does not support the Org format as of this writing.
  Some blogs generated with Haunt:

* Other Org-blogs

- [[][Also in pure Org/Lisp]]?
- [[][Also in pure Org/Lisp]].
- [[][Also in pure Org/Lisp]].
- [[][Also in pure Org/Lisp]].
- [[][Generated with Jekyll]].
- [[][Generated with Jekyll]].
- [[][Generated with Jekyll]].
- [[][Generated with Hugo]].
- [[][Generated with Nikola]].