Unverified Commit d2b04547 authored by Martin Isaksson's avatar Martin Isaksson
Browse files

Add post: a Gitlab CI pipeline for LaTeX

parent 412e25c2
Subproject commit 550a97a5bfa3dc70fca8bd8408dcd813a33cd836
Subproject commit 8fde18cce95f31e56d3a2576a6534d87f43b1add
title: How to annoy your co-authors: a Gitlab CI pipeline for LaTeX
layout: post
path: /images/blog.jpg
hidden: false
published: true
showrevisions: true
- reproducibility
- academia
- git
- gitlab
- continuous integration
**The process of writing a LaTeX document can be one full of manual steps, resulting in a patchwork document that is not _exercisable_ nor _complete_. This makes it impossible to reproduce the document from code and data. In this post we will create a pipeline for compiling a LaTeX document that works both locally and using GitLab CI. This is part of a series to create the perfect open science <code>git</code> repository.**
{% include toc %}
## Introduction
When writing a document in LaTeX I'd like to use `git` for version control even if am working alone on a project. This allows me to track my progress, have a backup, and make sure the document is completely _reproducible_ from raw data. The principle of
_Reproducible Research_ {% cite buckheit1995wavelab claerbout1992electronic
Artifact18:online %} is to make data and computer code available for others to
analyze and criticize.
A good open source repository is exercisable and complete {% cite Monperrus2018 Artifact18:online %}. This means that it must be possible to fully reproduce the document, down to the last pixel, from running a single script in the repository.
In this post we will take a look at the practicalities of writing a document in LaTeX using a Docker image as our build environment to ensure that we pass these requirements.
This post is part of a series and follows [Publication ready figures]({% post_url 2019-09-29-publication_ready_figures %}). To see more on requirements on open source repositories see [Reproducibility aspects of the Swedish COVID&ndash;19 estimate report]({% post_url 2020-04-25-fohm-seir-stockholm %}).
{% include figure.html url="/assets/gitlab-pipeline-simple.svg" description="Our GitLab CI pipeline consists of a two build stages and one test stage. These stages run in separate Docker containers." colclass="col-md-10" %}
## The GitLab CI pipeline
In GitLab we have a possibility to run a pipeline for each commit using [GitLab CI/CD](https://docs.gitlab.com/ee/ci/). For this project we have defined three stages: the first stage (`figures`) creates the plots in Python; the second (`build`) compiles the LaTeX document and the third (`test`) runs unit tests.
{% include figure.html url="/assets/gitlab-pipeline.png" description="Our GitLab CI pipeline consists of a figures stage, a build stage and a test stage." colclass="col-md-8" %}
The complete script [`.gitlab-ci.yml`](https://gitlab.com/martisak/latex-pipeline/-/blob/master/.gitlab-ci.yml) can be found in the GitLab repository.
### Compiling figures
Our first pipeline stage will compile figures according to [Publication ready figures]({% post_url 2019-09-29-publication_ready_figures %}). For this we use the official `python:3.8` Docker image. Any job artifacts created in this step will be carried over to the next stage.
{% highlight yaml linenos %}
image: python:3.8
stage: figures
- make -C figures
untracked: true
expire_in: 1 week
{% endhighlight %}
The figures are placed in the `figures` [subdirectory](https://gitlab.com/martisak/latex-pipeline/-/tree/master/figures) and are built using a [`Makefile`](https://gitlab.com/martisak/latex-pipeline/-/blob/master/figures/Makefile).
The reason for separating this step into a separate stage is that we assume generating figures can take a very long time, for example if a Machine Learning model is trained in this step. In this way we can also keep it separate when running it locally, so that we don't have to regenerate the figures everytime we want to compile the LaTeX document.
### Compiling the LaTeX document
The second stage in the pipeline will compile the actual LaTeX document. Here, we need to use a docker image that have LaTeX and all needed packages installed. The Docker image we use is `blang/latex:ctanfull`, which is using [TeXLive](https://www.tug.org/texlive/) 2017.
The job artifact of interest is of course the compiled pdf-document, but we include any untracked file so that any logfiles and other generated files will be included.
{% highlight yaml linenos %}
image: blang/latex:ctanfull
stage: build
- make pdf
- figures
untracked: true
expire_in: 1 week
when: on_success
{% endhighlight %}
### Running unit tests
The final stage of the pipeline will run unit tests on the created pdf-file. This is useful to for example make sure the number of pages are as expected, to check that the fonts are embedded properly and that any metadata is set correctly. We will cover these tests in detail in a later post, for now it is enough to say that these tests are written in Ruby, so we will use an appropriate Docker image.
{% highlight yaml linenos %}
image: ruby:2.7.1
stage: test
- compile
- bundle install
- make check
when: on_success
{% endhighlight %}
## LaTeX build system
We will use `latexmk` to build our LaTeX document. There are other build systems for LaTeX such as `rubber`, `latexrun` which also can be used, but `latexmk` has the advantage as is robust and already installed in the Docker image we are using.
## GNU Make
We will use GNU make to trigger the `latexmk` build locally, or in the GitLab runner. The entry points will be slightly different in these cases.
Here we assume that running `make figures` is a step that is very time consuming so we would like to avoid running that all the time.
The command that we run from the command line to compile the LaTeX document is `make`. This will first run a Docker container, mount the working directory as and run `make pdf` inside the container. Since the working directory is mounted, the pdf-file will remain after the Docker container has been shut down and removed.
The complete script [`Makefile`](https://gitlab.com/martisak/latex-pipeline/-/blob/master/Makefile) can be seen in the GitLab repository.
### Generating figures
The figures resides in the subdirectory `figures` which contains a `Makefile`.
These can be manually built with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data python:3.8 make -C /data/figures
{% endhighlight %}
### Compiling the document
Our document can be compiled using `latexmk` inside a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data blang/latex:ctanfull make pdf
{% endhighlight %}
The document will be compiled inside the container using
{% highlight bash linenos %}
latexmk -bibtex -pdf -pdflatex="pdflatex -interaction=nonstopmode" main.tex
{% endhighlight %}
### Running unit tests
The test cases, written in Ruby can either be run locally with
{% highlight bash linenos %}
rspec spec/pdf_spec.rb
{% endhighlight %}
which is the same as `make check` or in a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data ruby:2.7.1 bundle update --bundler; make check
{% endhighlight %}
This is the same as running `make check_docker`.
## LaTeX development environment
When writing a paper we would of course like to see the results of our changes in near real time, and not have to commit our changes to `git` in order to compile the document.
We can tweak the `render` `make` target a bit so that `latexmk` will be run with the `-pvc` flag {% cite Wienke2018 %}. This puts `latexmk` into preview and continuously update mode.
{% highlight bash linenos %}
make clean render LATEXMK_OPTIONS_EXTRA=-pvc
{% endhighlight %}
This means we can run this command once and just edit our document in our favorite text editor.
{% include figure.html url="/assets/latex-dev-env.png" description="LaTeX development environment using Sublime&nbsp;Text&nbsp;3." colclass="col-md-8" %}
## Related work
A common way of writing LaTeX documents together with others is to use [Overleaf](https://www.overleaf.com/). Editing can be done by all authors in real time and the compilation of the document is very fast. However, it doesn't allow us to run arbitrary code, or perform test cases on our document. Furthermore, the version control is hidden from us. Overleaf has a few ways of letting us share the work. In my work, some of the content is proprietary and can be sensitive until the document is reviewed. This means I am not able to use cloud solutions to write my documents.
Many authors have looked into using Gitlab CI for building LaTeX documents, for example {% cite Manik2019 Luhr2018 Khan2018 Ergus2016 %}. In this post we extend this work and make a complete pipeline that also be run locally. Our pipeline consists of three stages, `figures`, `build` and `test` each responsible for a separate part of the build process.
## Conclusions
We have constructed a simple pipeline for compiling LaTeX documents in a Docker container. This fulfills the requirements that our repository shall be complete and exercisable {% cite Monperrus2018 Artifact18:online %}.
In upcoming posts we will further look into defining test cases for documents, complicating the build with Pandoc and other tricks to annoy your co-authors.
<svg id="Lager_1" data-name="Lager 1" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 567 283.41"><defs><style>.cls-1{fill:#f6f6f6;}.cls-2,.cls-5,.cls-8{font-size:10px;}.cls-2,.cls-7,.cls-8{fill:#1d1d1b;}.cls-2,.cls-5{font-family:Helvetica;}.cls-3{font-family:UbuntuMono-Regular, Ubuntu Mono;}.cls-4{fill:#36a9e1;}.cls-5{fill:#fff;}.cls-10,.cls-6{fill:none;stroke-miterlimit:10;}.cls-6{stroke:#1d1d1b;}.cls-8{font-family:Helvetica-Bold, Helvetica;font-weight:700;}.cls-9{letter-spacing:-0.07em;}.cls-10{stroke:#0b9a39;stroke-width:3px;}</style></defs><rect class="cls-1" x="378" y="170.6" width="189" height="81" rx="2.83"/><rect class="cls-1" x="108" y="170.6" width="189" height="81" rx="2.83"/><text class="cls-2" transform="translate(117 242.6)">blang/latex:ctanfull</text><rect class="cls-1" x="108" y="17.6" width="189" height="81" rx="2.83"/><text class="cls-2" transform="translate(117 35.6)">python:3.8</text><text class="cls-2" transform="translate(0 198.04)">Local entrypoint<tspan class="cls-3"><tspan x="0" y="12">make render</tspan></tspan></text><text class="cls-2" transform="translate(0 75.58)">Local entrypoint<tspan class="cls-3"><tspan x="0" y="12">make figures</tspan></tspan></text><text class="cls-2" transform="translate(405 116.6)">Local entrypoint<tspan class="cls-3"><tspan x="0" y="12">make check_docker</tspan></tspan></text><rect class="cls-4" x="135" y="188.6" width="72" height="36" rx="2.83"/><text class="cls-5" transform="translate(153.77 208.49)">latexmk</text><rect class="cls-4" x="405" y="187.65" width="63" height="36" rx="2.83"/><text class="cls-2" transform="translate(323.94 278.6)">Artifacts</text><text class="cls-2" transform="translate(179.42 131.16)">Artifacts</text><text class="cls-2" transform="translate(224.94 199.81)">Artifacts</text><text class="cls-5" transform="translate(424.27 207.54)">rspec</text><text x="-9" y="-9.4"/><path class="cls-6" d="M357.17,234a2.85,2.85,0,0,0,2.83-2.83V205.56a7.88,7.88,0,0,0-2-4.83l-.48-.47a7.91,7.91,0,0,0-4.84-2H335.83a2.84,2.84,0,0,0-2.83,2.83v30.08a2.85,2.85,0,0,0,2.83,2.83Z" transform="translate(-9 -9.4)"/><polyline class="cls-6" points="351 196.06 343.71 196.06 343.71 188.86"/><line class="cls-6" x1="72" y1="206.6" x2="126.14" y2="206.6"/><polygon class="cls-7" points="124.16 209.5 135 206.6 124.16 203.69 124.16 209.5"/><text class="cls-2" transform="translate(0 44.6)">Remote entrypoint<tspan class="cls-3"><tspan x="0" y="12">git push</tspan></tspan></text><rect class="cls-4" x="135" y="44.6" width="72" height="36" rx="2.83"/><line class="cls-6" x1="72" y1="62.6" x2="126.14" y2="62.6"/><polygon class="cls-7" points="124.16 65.5 135 62.6 124.16 59.69 124.16 65.5"/><text class="cls-5" transform="translate(155.99 64.49)">python</text><line class="cls-6" x1="360" y1="206.6" x2="396.14" y2="206.81"/><polygon class="cls-7" points="394.14 209.7 405 206.86 394.18 203.89 394.14 209.7"/><line class="cls-6" x1="468" y1="206.6" x2="513.14" y2="206.6"/><polygon class="cls-7" points="511.16 209.5 522 206.6 511.16 203.69 511.16 209.5"/><line class="cls-6" x1="207" y1="206.6" x2="306.14" y2="206.6"/><polygon class="cls-7" points="304.16 209.5 315 206.6 304.16 203.69 304.16 209.5"/><text class="cls-2" transform="translate(387 242.6)">ruby:2.7.1</text><text class="cls-8" transform="translate(235.89 8.6)">Build figures</text><text class="cls-8" transform="translate(496.62 161.6)"><tspan class="cls-9">T</tspan><tspan x="5.37" y="0">est document</tspan></text><text class="cls-8" transform="translate(221.45 161.6)">Build document</text><line class="cls-6" x1="337.5" y1="233.6" x2="337.5" y2="251.74"/><polygon class="cls-7" points="334.6 249.76 337.5 260.6 340.4 249.76 334.6 249.76"/><line class="cls-6" x1="171" y1="80.6" x2="171" y2="179.74"/><polygon class="cls-7" points="168.09 177.76 171 188.6 173.91 177.76 168.09 177.76"/><line class="cls-6" x1="432" y1="134.6" x2="432" y2="179.74"/><polygon class="cls-7" points="429.1 177.76 432 188.6 434.9 177.76 429.1 177.76"/><polyline class="cls-10" points="531 206.6 540 215.6 549 188.6"/></svg>
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment