Fix and publish FOHM post

parent 60249b7b
Pipeline #139870890 passed with stage
in 4 minutes and 41 seconds
---
title: Reproducibility aspects of the Swedish COVID-19 estimate report
title: Reproducibility aspects of the Swedish COVID–19 estimate report
layout: post
image:
path: /images/blog.jpg
hidden: true
hidden: false
published: true
---
**Researchers have called out for more transparency from [The Public Health Agency of Sweden](https://www.folkhalsomyndigheten.se/) regarding the COVID-19 estimates for Sweden. Recently, a report has been released covering such estimates for the Stockholm region. Along the report, the code used for these estimated was uploaded to Github, which makes it possible for others to review
......@@ -34,7 +35,7 @@ accompanying
[code on Github](https://github.com/FohmAnalys/SEIR-model-Stockholm) (committed on the 23rd). The fact that the
code is made available is of course very positive, however, we will will review
and evaluate this code from a reproducibility point of view. We will use the
requirements from {% cite Monperrus2018 %} and {% cite Leek2016 %} for this evaluation and use commit [`bb616e9`](https://github.com/FohmAnalys/SEIR-model-Stockholm/tree/bb616e970d6cb3f8b01edfe735b1a418480c6af2) (2020-04-24) of the repository.
requirements from {% cite Monperrus2018 %} and {% cite Leek2016 %} to evaluate commit [`bb616e9`](https://github.com/FohmAnalys/SEIR-model-Stockholm/tree/bb616e970d6cb3f8b01edfe735b1a418480c6af2) (2020-04-24) of the repository.
It is important to note that we will not review or critique this code from a
health perspective. There will not be a single exponential curve in this post.
......@@ -60,29 +61,33 @@ This is the structure of the repository:
## Evaluation
### Findable and downloadable
### The repository must be findable and downloadable
The first requirement on a good data science repository is that it is findable
and downloadable {% cite Monperrus2018 %}.
An important requirement on a good data science repository is that it is
findable and downloadable {% cite Monperrus2018 %}.
The report {% cite ThePublicHealthAgencyofSweden2020 %} itself does not
contain a link to the code, but we find a link on another page of [The Public Health Agency of Sweden](https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/analys-och-prognoser/)
website. We didn't find the repository by Googling the name of the report, but
the repository includes the name and a link to the report.
### Version control and license
### The repository must be under version control and include a license
The code has been released on [Github](http://www.github.com) with a [GPUv2](https://github.com/FohmAnalys/SEIR-model-Stockholm/blob/master/LICENSE) license. We note that this seems to include results that are in the repository. Results that are generated by this code should not be covered by this license, and the report states that all figures are copyrighted and that a permission must be given by the copyright holder to publish them.
The code has been released on [Github](http://www.github.com) with a [GPUv2](https://github.com/FohmAnalys/SEIR-model-Stockholm/blob/master/LICENSE) license. We note that this seems to include results that are in the repository. Results that are generated by this code [should not be covered](https://www.gnu.org/licenses/gpl-faq.en.html#WhatCaseIsOutputGPL) by this license, and the report states that all figures are copyrighted and that a permission must be given by the copyright holder to publish them.
Best practices for licensing scientific code is a larger topic, and one that
could be debated so we leave that as further work.
Best practices for licensing scientific code is a larger topic, and one that
could be debated, so we leave that as further work.
The first commit is [verified](https://help.github.com/en/github/authenticating-to-github/about-commit-signature-verification), while the following commits are not. Verifying commits is a way to allow people to see that the content comes from a trusted source. The commits are from an individual without a [Github](http://www.github.com) account.
### Documented
### The repository must be documented
There are no instructions on how to run this code in the `README.md` file. Folders are self-explanatory with names such as `Scripts` and `Data`.
There are no instructions on how to run this code in the `README.md` file. There is no inventory, but folders are self-explanatory with names such as `Scripts` and `Data`.
### Exercisable
There are some instructions inside the script-file, such as an instruction to set the absolute path to the project, something we didn't need to do. Absolute paths should be avoided.
The script file itself is sparsely commented, and very long. A better structure is needed to make it more readable.
### The repository must be exercisable
We can see that there is an R-script in the `Scripts` folder, so we will attempt to run this as is.
......@@ -106,16 +111,16 @@ iconv -f iso-8859-1 -t utf-8 < Script/Estimate_SEIR_for_sharing_new_incidence.R
Since there is no documentation, we don't know what the required environment
is. In the code itself, there is a note that `R 3.5.2` has been used and the loaded packages are listed in one place. We
don't have a full description of the session or environment, so we
must ourselves figure out what versions of the dependencies were used. Luckily, there are not many dependencies and we can install them as follows.
must ourselves figure out what versions of the dependencies were used. Luckily, there are not many dependencies, and we can install them as follows.
{% highlight R linenos %}
install.packages(c("reshape2", "openxlsx", "RColorBrewer", "rootSolve","deSolve"))
{% endhighlight %}
Another alternative, a better one IMHO, is to use the
[`checkpoint` package](https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html).
[`checkpoint` package](https://mran.microsoft.com/documents/rro/reproducibility).
It allows one to set a checkpoint in time, so that
another use will use the packages and versions available at that time. We add the following to the top of the script.
another user will use the packages and versions available at that time. We add the following to the top of the script.
{% highlight R linenos %}
install.packages("checkpoint")
......@@ -128,7 +133,7 @@ Docker image {% cite Merkel2014 Boettiger2014 %} that have R and all the depende
In any case, it is often a good idea to include a `Makefile` so that someone can run `make` to the script.
### Input data
### Input data lineage
The data used to produce the result is included in the repository. The file `./Data/Data_2020-04-10Ny.txt` contains values up to 2020-04-10.
......@@ -155,15 +160,22 @@ same file. This would allow for someone to rerun the code with new data.
While there was no code-book included in the repository, the dataset is simple and the cleaning and the analysis seems to be well explained in the report.
### Complete
In one place, there are some magic numbers. These numbers are present in the dataset and didn't need to be input manually. Furthermore, they are not used for the analysis but seem to indicate that this analysis was run for other regions in Sweden.
{% highlight csv linenos %}
df_riket <- data.frame(ARegion = "Riket", Pop = 2385128+ 5855459+ 2078886)
{% endhighlight %}
The repository is complete if all numbers and figures from the paper be re-computed from the code {% cite Monperrus2018 %}.
### The repository must be complete
A repository is said to be complete if all numbers and figures from the paper be re-computed from the code {% cite Monperrus2018 %}.
The script produces a long list of figures (12 of them) and tables and while the numbers have not been checked in detail, they seem to be good. The report on the other hand contains more figures that are not generated by the code.
To be completely reproducible, the code must generate the report.
### Durable
### The repository must be durable
The specific commit used for the report should be archived and referenced from within the report. [Zenodo](https://zenodo.org/) makes it extremely simple to archive from a [Github](http://www.github.com) repository.
......@@ -176,6 +188,11 @@ The specific commit used for the report should be archived and referenced from w
## Conclusion
It is very positive that this code was made available to the public to be
reviewed and critiqued by anyone. It had some issues, that have been corrected
by others, but not at the time of writing included in the original repository.
reviewed and critiqued by anyone. It had some issues, some of which have
already been corrected by others, but not at the time of writing these
improvements have not been included in the original repository. We suggested many improvements that could be made in terms of reproducibility, which possibly could mean that other people can make contributions that improve the model and analysis made here. These improvements range from making better documentation, to handling input data better.
This is the first code published on Github by the account [FohmAnalys](https://github.com/FohmAnalys). I hope that the release of this code means that we can expect more openness and transparency in the future.
We, as scientists, have a lot to learn about open science and how to make code available, and I hope that this post could inspire you to share your own code.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment