Commit b6acd723 authored by Luke Johnston's avatar Luke Johnston

Merge branch 'master' of gitlab.com:rostools/manifesto into contributing-file

# Conflicts:
#	DESCRIPTION
#	_build.R
parents e5ae9378 0d98cee1
image: rocker/tidyverse
stages:
- build
- deploy
pages:
stage: deploy
build:
stage: build
script:
- R -e "remotes::install_deps(dependencies = T)"
- Rscript _build.R
- R -e "remotes::install_deps(dependencies = TRUE)"
- R -e "tinytex::install_tinytex()"
- R -e "bookdown::render_book('index.Rmd', 'bookdown::gitbook')"
artifacts:
paths:
- public
# - R -e "bookdown::render_book('index.Rmd', 'bookdown::pdf_book')"
# To produce a code coverage report as a GitLab page see
# https://about.gitlab.com/2016/11/03/publish-code-coverage-report-with-gitlab-pages/
deploy:
image: node:latest
stage: deploy
before_script:
- npm i -g netlify-cli
script:
- netlify deploy --site $NETLIFY_SITE_ID --auth $NETLIFY_AUTH_TOKEN --prod
only:
- master
dependencies:
- build
......@@ -4,7 +4,9 @@ Title: Heavily Opinionated Manifesto on Reproducible and Open Science Projects.
Version: 0.0.1.9000
Imports:
bookdown,
emo (>= 0.0.0.9000)
emo (>= 0.0.0.9000),
tinytex,
kableExtra
Remotes:
rstudio/bookdown,
hadley/emo
......
......@@ -3,4 +3,3 @@
bookdown::render_book('index.Rmd', 'bookdown::gitbook')
rmarkdown::render('CONTRIBUTING.Rmd', 'rmarkdown::github_document')
# bookdown::render_book('index.Rmd', 'bookdown::pdf_book')
......@@ -11,5 +11,7 @@ bookdown::gitbook:
text: "Edit"
#download: ["pdf"]
sharing: no
#bookdown::pdf_book:
# latex_engine: xelatex
bookdown::pdf_book:
latex_engine: xelatex
includes:
in_header: preamble.tex
---
title: "A Generalized and Structured Analytical Workflow (GSAW) for Reproducible and Openly Scientific (ROS) Projects"
title: "A Generalized and Structured Analytical Workflow for Reproducible and Openly Scientific Projects"
author:
- "Luke Johnston"
- "Joel Östblom"
- "Ahmed Hasan"
date: "`r Sys.Date()`"
site: bookdown::bookdown_site
output: bookdown::gitbook
documentclass: book
#bibliography: [book.bib, packages.bib]
biblio-style: apalike
link-citations: yes
#github-repo:
description: "Heavily Opinionated Manifesto on Reproducible and Open Science Projects"
description: "A Heavily Opinionated Manifesto on Reproducible and Open Science Projects"
---
\mainmatter
> Note: This document is still in active development, so text may and will likely
change as we work to completing it.
# Overview
```{r, include=FALSE, eval=FALSE}
......@@ -23,3 +27,57 @@ knitr::write_bib(c(
.packages(), 'bookdown', 'knitr', 'rmarkdown'
), 'packages.bib')
```
## Vision
Our dream is a future where the default for science is to be reproducible and
open. We hope that one day, researchers will conduct their scientific activities
following open scientific principles not as an active choice but because it is
the proper and *easier* way of doing science.
Our focus is in three branches: software, teaching, and support networks.
- For our software, we aim to automate what we can, simplify and streamline the
rest. We aspire to be similar to [devtools][^devtools], which is an R software
package designed to make it easier to create other R packages, for doing
reproducible and open science.
- For our learning material, we aspire to be a "go to" reference for learning
the *exact steps and processes* for doing reproducible and open science, and for
knowing and staying updated on which tools and services to use.
- For our support network, we hope to link and connect with all the amazing
research groups throughout the world who are working hard to practice and
promote reproducible and open science. Through this connection, we hope that it
will provide others with real-world examples and role models for how science
should be done.
[devtools]: https://devtools.r-lib.org/
We are (mostly) biomedical researchers, so our main focus and expertise is in
biomedical science.
## Mission
To acheive our vision, our mission is to create a highly opinionated, practical,
and process-oriented ecosystem of software tools and accompanying documentation,
tutorials, and learning materials, supported by a network of practitioners
(researchers), that informs on how to conduct open and reproducible science.
Our mission is divided into multiple parts based on the organizations' branches:
- Software: We aim to create an ecosystem of interconnected packages and software
tools, by either developing these tools or by linking existing ones. This ecosystem
of tools aims to reduce the burden on researchers in conducting open and reproducible
science through automation and programming. We hope to emulate the principles
and structure of the [tidyverse] ecosystem, as they are a great role model for
creating a functioning and solid ecosystem.
- Learning material: (still being developed.)
- Support network: (still being developed.)
[tidyverse]: https://www.tidyverse.org/
In all project branches, our aim is to design the software and learning material
to be comprehensive, carefully considered, and scientifically-informed by emphasizing
usability and simplicity.
Our current focus is on the data analysis and publication side of scientific
activities.
# Manifesto for ROS Projects {#ros-manifesto}
TODO: Need to incorporate the GSAW or other acronym throughout the manifesto
```{r, child="preamble-note.md"}
```
......@@ -28,7 +26,7 @@ more important in this changing scientific landscape.
These new demands should herald in better approaches to doing science, such as
greater training in computational aspects of research, data management, and
dissemination of findings. While there are small pockets of change adapting to
this new landscape, this however, is the exception and mainstream academic
this new landscape these are the exceptions and mainstream academic
science continues as it has for decades. Academia still (obsessively) rewards
publications as the currency for promotion, funding, and achievement. Since
following open and reproducible scientific practices is presently extremely
......@@ -37,7 +35,7 @@ is currently little incentive to do these practices as that would reduce the
effective number of publications produced in any given amount of time. Therefore,
until the current obsession with publication numbers declines, efforts to
simplify and make doing open and reproducible science (ROS) accessible and
(relatively easily) acheivable are a way to increase adherence and acceptance of
(relatively easily) achievable are a way to increase adherence and acceptance of
these practices. Even without the current incentive structure, simplifying the
process for doing these "ROS" practices would regardless be beneficial to
scientists given the high expectations placed on scientists already.
......@@ -47,7 +45,7 @@ scientists given the high expectations placed on scientists already.
There are many benefits to adopting ROS practices for research and scientific
activities. Publishing findings under an open access license increases exposure
to the public, both via media and direct download, and also increases the number
of scientist that may end up benefitting from the findings. Being open with the
of scientist that may end up benefiting from the findings. Being open with the
data and the analysis code increases the transparency and reproducibility of the
results and facilitates in assessing the validity of any claims made in the
paper, improving the scientific rigor and strength of the study.
......@@ -83,7 +81,7 @@ There seems to be two main problems with this lack of integration and uptake of
doing ROS. One, there are not many opinionated workflow tools that try to
automate and simplify many aspects of ROS. Two, the documentation on many of
these ROS tools and services is often incomplete, not comprehensive enough, or
not effectively targetted to the end user who is likely completely unfamiliar
not effectively targeted to the end user who is likely completely unfamiliar
with many of the ROS terms and concepts. There are other reasons for
non-adherence to ROS practices, such as the aforementioned lack of incentive
structures. However, these are massive systemic problems that can only be
......@@ -108,8 +106,8 @@ Our philosophy is to encourage reproducible and open scientific practices by
automating and streamlining many aspects of a ROS project and by providing an
opinionated view on which tools, services, and workflows to use when doing
research. The goal is to reduce the burden on researchers and lower the barrier
to doing open and reproducible science by creating a Generalized and Structured
Analytical Workflow (GSAW).
to doing open and reproducible science by creating a generalized and structured
analytical workflow and approach to research projects.
For now, we are focusing on typical scientific activities such as creating
abstracts, slides, posters, and manuscripts. We aim to incorporate creating
......@@ -134,15 +132,16 @@ not-for-profit (or at least have a strong history of supporting open source and
open science activities)
- Should be actively developed and well-maintained
- Should have well-developed documentation, resources, and learning material
- The company, organization, or community responsible for the tools or services
should be ethical, have strong principles in favour of openness, and be a strong
advocate and supporter of fairness and equity
- (Optional) Preferably, the company, organization, or community responsible
for the tools or services should have strong principles, policies, and actions
in favour of openness and be a strong advocate and supporter of fairness and
equity
[open source]: https://opensource.org/osd
When a tool and/or service is mostly equal, consider that:
- The design focuses and emphasizes simplicity, useability, and accessibility
- The design focuses and emphasizes simplicity, usability, and accessibility
- It is already widely used and accepted within the ROS community
- Has a system to allow easy programmatic access (e.g. has a public [API])
......@@ -150,12 +149,12 @@ When a tool and/or service is mostly equal, consider that:
### Guiding principles on workflow and processes
Likewise, for the analysis and workflow (the GSAW) aspect of ROS, we follow
Likewise, for the analysis and workflow aspect of ROS, we follow
these guiding principles:
- Favour readability over concision
- Favour well-established infrastructures and approaches
- Be internally consistent in filenaming, code style syntax, and language
- Be internally consistent in file names, code style syntax, and language
- Consider and abide by privacy rules and laws (e.g. [GDPR] in Europe)
- Use and adhere to existing checklists (e.g. [STROBE] in epidemiology)
- Favour approaches that explicitly show steps taken from data to final
......@@ -191,7 +190,7 @@ being in the "Advanced" stage.
### Phases of a research project
To help navigate the recommendations and steps for a GSAW-ROS project, phases
To help navigate the recommendations and steps for a ROS project, phases
of a research project are split into:
- Project management throughout (specifically regarding files, folders)
......@@ -202,7 +201,28 @@ of a research project are split into:
- Dissemination
All current and future tools, services, and workflows incorporated into a
GSAW-ROS project template must be based on these guiding principles and
ROS project template must be based on these guiding principles and
considerations.
TODO: Include guiding principles for creating teaching material
TODO: Include guiding principles for creating teaching material?
## ROS principles as a checklist
Inspiration from [Software Sustainability Institute blog](https://www.software.ac.uk/blog/2018-05-22-sharing-reproducible-research-minimum-requirements-and-desirable-features),
[blog by Jonathan Peelle](https://thewinnower.com/papers/3706-a-manuscript-checklist-for-improving-science),
and from the [Transparency Checklist](https://www.nature.com/articles/s41562-019-0772-6)
(DOI: 10.1038/s41562-019-0772-6).
<!-- "Read more" in the column will contain links to where to read more about that item -->
```{r}
options(knitr.kable.NA = "")
library(kableExtra)
checklist <-
read.csv("resources/checklist.csv",
stringsAsFactors = FALSE,
check.names = FALSE)
knitr::kable(checklist, align = "l") %>%
kable_styling(c("striped", "hover"))
```
[build]
publish = "public"
\usepackage[none]{hyphenat}
\frontmatter
\newcommand{\horrule}[1]{\rule{\linewidth}{#1}}
\title{
\normalfont
\horrule{1pt} \\[0.4cm]
\huge A Generalized and Structured Analytical Workflow for Reproducible and Openly Scientific Projects
\horrule{1pt} \\[0.5cm]
}
\author{Luke Johnston, Joel Östblom, Ahmed Hasan}
\date{2019}
# Specific recommendations {#recommendations}
```{r, child="preamble-note.md"}
......@@ -12,13 +11,13 @@ some comparisons between options.
Open science encompasses a vast number of diverse tools and services that is
continuously increasing. This encouraging growth indicates that open science is
actively evolving and that there is a rich network of people and organizations
devoted to improving current scientific practices. A downside to this plenty is
that is can act as a barrier for researchers who desire to work more in the
devoted to improving current scientific practices. A downside to this abundance is
that it can act as a barrier for researchers who desire to work more in the
open. The range of tool choices and the lack of guidance on what to use
particularly risks to overwhelm and discourage researchers seeking to open up
their workflow for the first time.
To provide a solution to this problem, the GSAW-ROS framework provides heavily
To provide a solution to this problem, the ROS framework provides heavily
opinionated recommendations on open tools, workflows, and services. Below is a
brief summary of the specific recommendations we make, followed by more detailed
explanations and comparisons between tools, services, and workflows.
......@@ -28,16 +27,27 @@ explanations and comparisons between tools, services, and workflows.
- **File management and version control**: [Git], combined with [GitHub] or [GitLab]
- **Statistical and/or programming language**: [R] or [Python]
- **For writing documents**: [Pandoc Markdown] (e.g. [R Markdown])
- **Analytic and writing platform**: [RStudio] (for R) or [JupyterLab] (for Python)
- **Analytic platform**: [RStudio] (for R) or [JupyterLab] (for Python)
- **Writing platform**: [RStudio]
- **Dissemination** for getting a DOI and for discoverability:
- **Code and other project files**: [Zenodo]
- **Preprint manuscripts**: [bioRxiv] or [PeerJ Preprints] or [OSF Preprints]
- **Preprint manuscripts**: [bioRxiv], [medRxiv], or [OSF Preprints]
- **Posters**: [figshare] or [PeerJ Preprints]
- **Slides**: ??? [figshare]?
- **All activities**: For R projects, preferably everything is done in [RStudio].
- **All activities**: For R projects, preferably everything is done in
[RStudio]. See the [workflow section](#workflow) below for more detail. For
Python projects the environment is a bit more complicated and we are still
thinking through how it would look.
<!--
For Python projects, most work can be done in [JupyterLab], however other tools
will also need to be used. See the [workflow section](#workflow) below for more
detail.
will also need to be used.
For Jupyter Notebook, it might make sense to always have Rmd as the backend and
then use RStudio as a Markdown and Git GUI. There is no other platform with as
much support for different publishing option through a GUI, so I think it will
be used for writing. For git, there is git kraken and nbdime and git jupyterlab
exteisnion as an alternative.
-->
[Git]: https://git-scm.com/
[GitHub]: https://github.com/
......@@ -50,7 +60,7 @@ detail.
[JupyterLab]: https://jupyterlab.readthedocs.io/en/stable/
[Zenodo]: https://zenodo.org/
[bioRxiv]: https://www.biorxiv.org/
[PeerJ Preprints]: https://peerj.com/preprints/
[medRxiv]: https://www.medrxiv.org/
[OSF Preprints]: https://osf.io/preprints/
[figshare]: https://figshare.com/
......@@ -151,8 +161,9 @@ then excludes it from being part of a ROS workflow.
There are many programming and statistical computing languages available,
both open source and proprietary. However, of them all we recommend using [R]
and [Python]. Both languages are open source, have active and (mostly) welcoming
communities, have very well developed packages and extensions for all types of
and [Python]. Both languages are open source, have active communities, are
working at being more welcoming and inclusive,
have very well developed packages and extensions for all types of
analyses projects, are well maintained and documented, are (mostly) readable,
are widely used in the scientific community, and are the two most widely used
languages in the world for data science. The R community in particular is very
......@@ -179,49 +190,52 @@ researcher's institution can't afford a license, the text of that document will
be inaccessible.
More commonly, if one finds a document written using an older version of the
software (e.g. `.doc` vs `.docx`), there is no guarantee it can be opened in the
new version of the software. Documents can only be opened by people who can
afford to purchase the products sold by the vendor. Opening the same document in
different versions of the same software or on different computers could render
different results (such as when opening a Windows PowerPoint presentation on a
Mac). Storing either data or manuscripts in such formats means that they can be
lost forever or could be inaccessible to certain groups of people. In contrast,
writing in an open, text-based source format means that the document can be
opened by anyone with access to a computer or mobile device.
There are several plain-text "[markup language]" formates, such as [LaTeX] or
[HTML]. However, there are major drawbacks to these "languages", including the
difficulty and effort required to learn them. Luckily, there is the
[Markdown] format which is simple to learn and to use. Since Markdown is just
plain text, changes can also be easily tracked using [Git] and collaboration can
happen on [GitHub] or [GitLab]. There are also promising online text editors
emerging which support Markdown with track changes to ease the transition for
those not wanting to learn GitHub, e.g. Authorea. Plus, when using
[Pandoc Markdown], the document can be converted to a large range of output
formats, including Word `.docx`, beautifully typeset [LaTeX] PDFs, or web friendly
[HTML] files. Markdown also has features typically required of scientific
writing such as citation and bibliography insertion (including plain text
formats such as [BibTeX]). Taken together Markdown is a simple, powerful plain
text format that ensures documents will stick around well into the future.
software (e.g. `.doc` vs `.docx`), there is no guarantee it can be opened in
the new version of the software. Opening the same document in different
versions of the same software or on different computers could render different
results (such as when opening a Windows PowerPoint presentation on a Mac).
Documents can only be opened by people who can afford to purchase the products
sold by the vendor. Storing either data or manuscripts in such formats means
that they can be lost forever or could be inaccessible to certain groups of
people. In contrast, writing in an open, text-based source format means that
the document can be opened by anyone with access to a computer or mobile
device.
Open, text-based formats are commonly referred to as plain text documents.
Although plain text itself cannot be formatted into headings, bold font, etc;
the addition of text-based markup, such as `*` or `[bold]` surrounding a word,
enables text editors to display plain text as formatted documents. There are
several plain-text "[markup language]", such as [LaTeX] or [HTML], but many of
these have verbose markup that make them inefficient to type and difficult to
learn. [Markdown] is a markup language that was designed from markup
conventions used over email so it is simple to learn and easy to type. A flavor
of Markdown called [Pandoc Markdown]) is specialized on scholarly communication
and support features required in scientific writing such as automatic figure
referencing, in text citations, and bibliography insertion (including plain
text formats such as [BibTeX]). [Pandoc Markdown] documents can also be
converted to a large range of output formats, including Word `.docx`,
beautifully typeset [LaTeX] PDFs, or web friendly [HTML] files. [R Markdown] is
an extension of [Pandoc Markdown] that allows R and Python code to be executed
within and inserted into a document, increasing document-level reproducibility.
Since Markdown is just plain text, changes can be easily tracked using [Git]
and collaboration can happen on [GitHub] or [GitLab]. There are also promising
online text editors emerging which support Markdown with track changes to ease
the transition for people used to conventional word processors, e.g. [Authorea]
and [Stencila]. Taken together the Markdown format is an open plain text format
that is accessible and usable on all operating systems, has an active community
of users, is well maintained and documented (e.g. the [Pandoc Markdown] manual
or the [R Markdown Book]), can be converted in a wide range of document types
(see the [Pandoc Markdown] about page for examples), is designed for simplicity
and readability, and has flavors dedicated to scholarly communication.
[HTML]: https://en.wikipedia.org/wiki/HTML
[LaTeX]: https://www.latex-project.org/
[markup language]: https://en.wikipedia.org/wiki/Markup_language
[Markdown]: https://en.wikipedia.org/wiki/Markdown
[BibTeX]: https://en.wikipedia.org/wiki/BibTeX
TODO: Determine if there is a "Python Markdown" available
The Markdown format is open source, is simple plain text (accessible and usable
on all operating systems and versions), has an active community of users, is
well maintained and documented (e.g. the [Pandoc Markdown] or the [R Markdown
Book]), can be converted in a wide range of document types (see the [Pandoc
Markdown] about page for examples), is designed for simplicity and readability.
For R based projects, use [R Markdown], an extension of Markdown that allows R
code to be executed within and inserted into a document, increasing
document-level reproducibility. [RStudio] offers a great environment to write R
Markdown.
[Authorea]: https://www.authorea.com/
[Stencila]: https://stenci.la/
[R Markdown Book]: https://bookdown.org/yihui/rmarkdown/
#### Dissemination phase
......
Item,Read more:,Done? (Y/N),Why not?
**Reproducibility**,,,
Indicate software dependencies,,,
Analysis code linked to open/simulated data,,,
Use standard project folder and file structure,,,
"Document (slides, poster, manuscript, etc) written reproducibly",,,
Provide explicit analysis steps taken,,,
Files under version control,,,
Code documented and explained,,,
Clearly label input and output data/results,,,
Follow code syntax style guide,,,
Run and pass spell checking,,,
Results reproduced on a clean computing environment,,,
Computing environment described,,,
List of all software tools used,,,
Data assertion checks (data passes expectations),,,
,,,
**Openness**,,,
"Make data publicly available, with DOI",,,
"Make data analysis (code) publicly available, with DOI",,,
Submit preprint,,,
"Pre-register study, with DOI",,,
"Pre-register analysis, with DOI",,,
License analysis code (MIT or similar),,,
License scientific content (CC-BY or similar),,,
Publish in open access journal,,,
"Make slides/poster publicly available, with DOI",,,
Manuscript contains all relevant URL/DOI,,,
Software and analysis methods used are open source,,,
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment