README.md 3.91 KB
Newer Older
Sebastian Schmeier's avatar
Sebastian Schmeier committed
1
# Data associated with a tutorial on reproducibility
Sebastian Schmeier's avatar
Sebastian Schmeier committed
2

Sebastian Schmeier's avatar
Sebastian Schmeier committed
3

4
## The tutorial
Sebastian Schmeier's avatar
Sebastian Schmeier committed
5
The tutorial that this data belongs to can be found [here](https://reproducible.sschmeier.com/) ([https://reproducible.sschmeier.com/](https://reproducible.sschmeier.com/)).
6

7 8
## How to use this repository
This is a directory structure to develop a Snakemake workflow. It contains all the data (see below)that is needed. A Snakemake workflow is based on a "Snakefile" that contains rules on how to process certain types of data. **The aim of the tutorial is to develop a "Snakefile" to analyse the samples in the fastq directory.**
9

10
There are some examples of how a Snakefile can be developed in the "examples" directory.
11

12
## Data
Sebastian Schmeier's avatar
Sebastian Schmeier committed
13 14

### Samples
Sebastian Schmeier's avatar
Sebastian Schmeier committed
15 16
The data is from a transcriptomics experiment in yeast and has been downsampled heavily to facilitate quick analyses.
The original data can be found at the Short Read Archive ([https://www.ncbi.nlm.nih.gov/sra?linkname=bioproject_sra_all&from_uid=212389](https://reproduce.sschmeier.com)).
Sebastian Schmeier's avatar
Sebastian Schmeier committed
17 18 19 20 21 22 23 24 25 26 27 28

-  [https://www.ncbi.nlm.nih.gov/sra/SRX362638](https://www.ncbi.nlm.nih.gov/sra/SRX362638)
-  [https://www.ncbi.nlm.nih.gov/sra/SRX362639](https://www.ncbi.nlm.nih.gov/sra/SRX362639)
-  [https://www.ncbi.nlm.nih.gov/sra/SRX362640](https://www.ncbi.nlm.nih.gov/sra/SRX362640)
-  [https://www.ncbi.nlm.nih.gov/sra/SRX362641](https://www.ncbi.nlm.nih.gov/sra/SRX362641)
-  samples: [SRR941826, SRR941827, SRR941830, SRR941831]

### Genome

- [ftp://ftp.ensembl.org/pub/release-92/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz](ftp://ftp.ensembl.org/pub/release-92/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz)


29
### Genome Annotation
Sebastian Schmeier's avatar
Sebastian Schmeier committed
30

31 32 33 34 35 36 37
- [ftp://ftp.ensembl.org/pub/release-92/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.92.gtf.gz"](ftp://ftp.ensembl.org/pub/release-92/gtf/saccharomyces_cerevisiae/Saccharomyces_cerevisiae.R64-1-1.92.gtf.gz")


## Running Snakemake

```bash
# create snakemake env
38
conda env create -n snakemake --file envs/snakemake.yaml
39 40 41 42 43 44 45 46 47

# activate env
conda activate snakemake

# dry run
snakemake -np --use-conda

# execute
snakemake -Tp --use-conda
48 49 50 51
```

## Examples

52 53
### Directory content

54 55 56 57 58 59 60 61 62 63 64
```bash
examples/
├── dag_v5.png    ## workflow diagram for v5
├── dag_v6.png    ## workflow diagram for v6
├── dag_v7.png    ## workflow diagram for v7
├── Snakefile_v2  ## workflow with one rule (trimming) and explicit naming of targets
├── Snakefile_v3  ## workflow with one rule (trimming) and target file detection based on samples 
├── Snakefile_v4  ## workflow with one rule (trimming) + logging and benchmarking of the rule
├── Snakefile_v5  ## workflow with one rule (trimming) + specific conda environment for rule
├── Snakefile_v6  ## workflow with three rules (trimmming, genome indexing and read mapping)
├── Snakefile_v7  ## complete workflow, conda-based rule execution
65 66 67 68 69 70 71 72 73 74 75 76
├── Snakefile_v8  ## complete workflow, singularity container based rule execution
└── Snakefile_v9  ## workflow with one rule (trimming), readiong samples from Google Cloud Storage
```

### Running examples

```bash
# use singularity containers
snakemake -Tp --use-singularity --snakefile examples/Snakefie_v8

# use singularity locally but get smaples from GS bucket and put results to bucket as well
snakemake -Tp --use-singularity --default-remote-provider GS --default-remote-prefix schmeier-reproduce-bucket --snakefile examples/Snakefile_v9
77
```
78 79 80 81 82 83 84

### NeSI

```bash
# using Singularity, only if set up on cluster
snakemake --use-singularity --singularity-args "--bind /scale_wlg_nobackup/filesets/nobackup/PROJECTNUMBER" -j 999 --cluster-config data/nesi/cluster-nesi-mahuika.yaml --cluster "sbatch -A {cluster.account} -p {cluster.partition} -n {cluster.ntasks} -t {cluster.time} --hint={cluster.hint} --output={cluster.output} --error={cluster.error} -c {cluster.cpus-per-task} --mem={cluster.mem}" -p --rerun-incomplete
```