Update Gitlab CI Post: structure, cache and badge

Update post with new structure, new sections about caching and adding a badge.
parent c6c06a55
Pipeline #158565937 passed with stage
in 3 minutes and 2 seconds
......@@ -842,6 +842,15 @@ url = {https://sayantangkhan.github.io/latex-gitlab-ci.html},
urldate = {2020-05-11},
year = {2018}
}
@misc{vipinajayakumar2020,
author = {Ajayakumar, Vipin},
title = {Continuous Integration of LaTeX projects with GitLab Pages},
url = {https://www.vipinajayakumar.com/continuous-integration-of-latex-projects-with-gitlab-pages.html},
urldate = {2020-06-20},
year = {2020}
}
@article{Hoonlor2013,
address = {New York, NY, USA},
author = {Hoonlor, Apirak and Szymanski, Boleslaw K and Zaki, Mohammed J},
......@@ -921,3 +930,12 @@ pages = {87--90},
title = {{Jupyter Notebooks -- a publishing format for reproducible computational workflows}},
year = {2016}
}
@misc{Gabor2020,
author = {Gabor, Bernat},
mendeley-groups = {python-packages},
title = {virtualenv},
url = {https://virtualenv.pypa.io/},
year = {2020}
}
......@@ -27,19 +27,101 @@ description: Learn how to compile LaTeX documents in a reproducible way.
When writing a document in LaTeX I'd like to use `git` for version control even if am working alone on a project. This allows me to track my progress, have a backup, and make sure the document is completely _reproducible_ from raw data. The principle of
_Reproducible Research_ {% cite buckheit1995wavelab claerbout1992electronic
Artifact18:online %} is to make data and computer code available for others to
analyze and criticize.
analyze and criticize.
A good open source repository is exercisable and complete {% cite Monperrus2018 Artifact18:online %}. This means that it must be possible to fully reproduce the document, down to the last pixel, from running a single script in the repository.
In this post we will take a look at the practicalities of writing a document in LaTeX using a Docker image as our build environment to ensure that we pass these requirements.
In this post we will take a look at the practicalities of writing a reproducible document in LaTeX using a Gitlab CI pipeline to ensure that we pass these requirements.
{% include figure.html url="/assets/images/gitlab-pipeline-simple.svg" description="Our GitLab CI pipeline consists of a two build stages and one test stage. Each stage run in separate Docker containers." colclass="col-md-10" %}
This post is part of a series and follows [Publication ready figures]({% post_url 2019-09-29-publication_ready_figures %}). To see more on requirements on open source repositories see [Reproducibility aspects of the Swedish COVID–19 estimate report]({% post_url 2020-04-25-fohm-seir-stockholm %}).
{% include figure.html url="/assets/images/gitlab-pipeline-simple.svg" description="Our GitLab CI pipeline consists of a two build stages and one test stage. These stages run in separate Docker containers." colclass="col-md-10" %}
### Our contributions
- We define three phases of document compilation that compile figures, compile the main document and test the compiled document against some set of known requirements.
- We contruct a local compilation pipeline based on `latexmk`, `make` and Docker {% cite Merkel2014 %}.
- We construct a Gitlab CI pipeline that automatically compile the document when we push new code to the remote repository.
## What we will need
In this post we will use `git` and target the Gitlab CI pipeline framework, and so you will need a repository on [Gitlab](https://www.gitlab.com).
I recommend adding a `.gitignore` based on the [Gitlab TeX .gitignore template](https://github.com/github/gitignore/blob/master/TeX.gitignore).
## The local build system
We will use `latexmk` to build our LaTeX document. There are other build systems for LaTeX such as `rubber`, `latexrun` which also can be used, but `latexmk` has the advantage as is robust and already installed in the Docker image we are using.
We will use [GNU make](https://www.gnu.org/software/make/) to trigger the `latexmk` build locally, or in the GitLab runner. The entry points will be slightly different in these cases.
Here we assume that running `make figures` is a step that is very time consuming so we would like to avoid running that all the time.
The command that we run from the command line to compile the LaTeX document is `make`. This will first run a Docker container, mount the working directory as and run `make pdf` inside the container. Since the working directory is mounted, the pdf-file will remain after the Docker container has been shut down and removed.
The complete script [`Makefile`](https://gitlab.com/martisak/latex-pipeline/-/blob/master/Makefile) can be seen in the [GitLab repository](https://gitlab.com/martisak/latex-pipeline/).
### Generating figures
The figures resides in the subdirectory `figures` which contains a `Makefile`.
These can be manually built with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data python:3.8 make -C /data/figures
{% endhighlight %}
Each figure is generated from raw data and plotted using a Python script. Each script generates a figure in TiKZ format with the same base name, but with extension ".tex".
### Compiling the document
Our document can be compiled using `latexmk` inside a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data blang/latex:ctanfull make pdf
{% endhighlight %}
The document will be compiled inside the container using
{% highlight bash linenos %}
latexmk -bibtex -pdf -pdflatex="pdflatex -interaction=nonstopmode" main.tex
{% endhighlight %}
The container we are using is based on [a Docker image which has TeXLive 2017 installed](https://github.com/blang/latex-docker) on top of an Ubuntu base image.
### Running unit tests
The test cases, written in Ruby can either be run locally with
{% highlight bash linenos %}
rspec spec/pdf_spec.rb
{% endhighlight %}
which is the same as `make check` or in a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data ruby:2.7.1 bundle update --bundler; make check
{% endhighlight %}
This is the same as running `make check_docker`. For a more in-depth guide to LaTeX document unit testing see [How to beat publisher PDF checks with LaTeX document unit testing]({% post_url 2020-05-16-latex-test-cases %}).
### LaTeX development environment
When writing a paper we would of course like to see the results of our changes in near real time, and not have to commit our changes to `git` in order to compile the document.
We can tweak the `render` `make` target a bit so that `latexmk` will be run with the `-pvc` flag {% cite Wienke2018 %}. This puts `latexmk` into preview and continuously update mode.
{% highlight bash linenos %}
make clean render LATEXMK_OPTIONS_EXTRA=-pvc
{% endhighlight %}
This means we can run this command once and just edit our document in our favorite text editor.
{% include figure.html url="/assets/images/latex-dev-env.png" description="Development environment using Sublime Text 3." colclass="col-md-8" %}
## The GitLab CI pipeline
In GitLab we have a possibility to run a pipeline for each commit using [GitLab CI/CD](https://docs.gitlab.com/ee/ci/). For this project we have defined three stages: the first stage (`figures`) creates the plots in Python; the second (`build`) compiles the LaTeX document and the third (`test`) runs unit tests.
In GitLab we have a possibility to run a pipeline for each commit using [GitLab CI/CD](https://docs.gitlab.com/ee/ci/). For this project we have defined three stages: the first stage `figures` creates the plots in Python; the second `build` compiles the LaTeX document and the third `test` runs unit tests on the compiled PDF document.
{% include figure.html url="/assets/images/gitlab-pipeline.png" description="Our GitLab CI pipeline consists of a figures stage, a build stage and a test stage." colclass="col-md-8" %}
......@@ -64,6 +146,39 @@ The figures are placed in the `figures` [subdirectory](https://gitlab.com/martis
The reason for separating this step into a separate stage is that we assume generating figures can take a very long time, for example if a Machine Learning model is trained in this step. In this way we can also keep it separate when running it locally, so that we don't have to regenerate the figures everytime we want to compile the LaTeX document.
#### Speeding up the build with caching
The `figures` stage can take a very long time since we need to download and install packages every time the stage runs. To avoid this we can use the example from [Cache dependencies in GitLab CI/CD](https://docs.gitlab.com/ee/ci/caching/) so that the figure stage becomes
{% highlight yaml linenos %}
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
cache:
key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG"
paths:
- .cache/pip
- venv/
figures:
image: python:3.8
stage: figures
before_script:
- python -V
- pip install virtualenv
- virtualenv venv
- source venv/bin/activate
script:
- make -C figures
artifacts:
untracked: true
expire_in: 1 week
{% endhighlight %}
We are using a `virtualenv` {% cite Gabor2020 %} to be able to cache the installed packages as well.
Care has to be taken with this - the cache can become to big for Gitlab to handle.
### Compiling the LaTeX document
The second stage in the pipeline will compile the actual LaTeX document. Here, we need to use a docker image that have LaTeX and all needed packages installed. The Docker image we use is `blang/latex:ctanfull`, which is using [TeXLive](https://www.tug.org/texlive/) 2017.
......@@ -100,82 +215,32 @@ test:
when: on_success
{% endhighlight %}
### Adding a "Download PDF" button
## LaTeX build system
We will use `latexmk` to build our LaTeX document. There are other build systems for LaTeX such as `rubber`, `latexrun` which also can be used, but `latexmk` has the advantage as is robust and already installed in the Docker image we are using.
## GNU Make
We will use GNU make to trigger the `latexmk` build locally, or in the GitLab runner. The entry points will be slightly different in these cases.
Here we assume that running `make figures` is a step that is very time consuming so we would like to avoid running that all the time.
The command that we run from the command line to compile the LaTeX document is `make`. This will first run a Docker container, mount the working directory as and run `make pdf` inside the container. Since the working directory is mounted, the pdf-file will remain after the Docker container has been shut down and removed.
The complete script [`Makefile`](https://gitlab.com/martisak/latex-pipeline/-/blob/master/Makefile) can be seen in the GitLab repository.
### Generating figures
The figures resides in the subdirectory `figures` which contains a `Makefile`.
These can be manually built with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data python:3.8 make -C /data/figures
{% endhighlight %}
### Compiling the document
Our document can be compiled using `latexmk` inside a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data blang/latex:ctanfull make pdf
{% endhighlight %}
Now when we have gone through all of this, we would like to share our final document with others. I like using a Gitlab badge for this.
The document will be compiled inside the container using
{% highlight bash linenos %}
latexmk -bibtex -pdf -pdflatex="pdflatex -interaction=nonstopmode" main.tex
{% endhighlight %}
Since we named our document `main.pdf` and the compilation stage is named `compile` we can find our document at `https://gitlab.com/martisak/latex-pipeline/-/jobs/artifacts/master/raw/main.pdf?job=compile`.
### Running unit tests
Of course, we need a fancy image to go with it, and we can generate one using [shields.io](https://shield.io).
The test cases, written in Ruby can either be run locally with
![Download PDF](https://img.shields.io/badge/Download-PDF-green)
{% highlight bash linenos %}
rspec spec/pdf_spec.rb
{% endhighlight %}
You can add this badge either by adding it to your `README.md` or in your Gitlab settings under General and Badges.
which is the same as `make check` or in a Docker container with
{% highlight bash linenos %}
docker run --rm -w /data/ -v`pwd`:/data ruby:2.7.1 bundle update --bundler; make check
{% endhighlight %}
This is the same as running `make check_docker`.
## LaTeX development environment
When writing a paper we would of course like to see the results of our changes in near real time, and not have to commit our changes to `git` in order to compile the document.
We can tweak the `render` `make` target a bit so that `latexmk` will be run with the `-pvc` flag {% cite Wienke2018 %}. This puts `latexmk` into preview and continuously update mode.
{% highlight bash linenos %}
make clean render LATEXMK_OPTIONS_EXTRA=-pvc
{% endhighlight %}
This means we can run this command once and just edit our document in our favorite text editor.
{% include figure.html url="/assets/images/latex-dev-env.png" description="Development environment using Sublime Text 3." colclass="col-md-8" %}
{% include figure.html url="/assets/images/download_pdf_badge.png" description="Add a badge to the Gitlab repository" colclass="col-md-8" %}
## Related work
A common way of writing LaTeX documents together with others is to use [Overleaf](https://www.overleaf.com/). Editing can be done by all authors in real time and the compilation of the document is very fast. However, it doesn't allow us to run arbitrary code, or perform test cases on our document. Furthermore, the version control is hidden from us. Overleaf has a few ways of letting us share the work. In my work, some of the content is proprietary and can be sensitive until the document is reviewed. This means I am not able to use cloud solutions to write my documents.
A common way of writing LaTeX documents together with others is to use [Overleaf](https://www.overleaf.com/). Editing can be done by all authors in real time and the compilation of the document is very fast. However, the online version doesn't allow us to run arbitrary code, or perform test cases on our document. Furthermore, the version control is hidden from us. Overleaf has a few ways of letting us share the work. In my work, some of the content is proprietary and can be sensitive until the document is reviewed. This means I am not able to use cloud solutions to write my documents. However, Overleaf provides [a Docker image](https://github.com/overleaf/overleaf) that can be deployed locally.
Many authors have looked into using Gitlab CI for building LaTeX documents, for example {% cite Manik2019 Luhr2018 Khan2018 Ergus2016 %}. {% cite vipinajayakumar2020 %} wrote a very nice and complete guide, and used [Gitlab Pages](https://docs.gitlab.com/ee/user/project/pages/) to deploy the compiled document.
Many authors have looked into using Gitlab CI for building LaTeX documents, for example {% cite Manik2019 Luhr2018 Khan2018 Ergus2016 %}. In this post we extend this work and make a complete pipeline that also be run locally. Our pipeline consists of three stages, `figures`, `build` and `test` each responsible for a separate part of the build process.
In this post we extend this work and make a complete pipeline that also be run locally. Our pipeline consists of three stages, `figures`, `build` and `test` each responsible for a separate part of the build process.
## Conclusions
We have constructed a simple pipeline for compiling LaTeX documents in a Docker container. This fulfills the requirements that our repository shall be complete and exercisable {% cite Monperrus2018 Artifact18:online %}.
In upcoming posts we will further look into defining test cases for documents, complicating the build with Pandoc and other tricks to annoy your co-authors.
To quickly get started, you can [fork my repository on Gitlab](https://gitlab.com/martisak/latex-pipeline/-/forks/new).
In upcoming posts we will further look into [defining test cases for documents]({% post_url 2020-05-16-latex-test-cases %}), complicating the build with Pandoc and other tricks to annoy your co-authors.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment