Commit b44a76ae authored by Neville Sanjana's avatar Neville Sanjana
Browse files

Minor edits to Readme.md

parent 57dceb8b
# Cas13design v0.2
This R software scores guide RNA for the RNA-targeting RfxCas13d CRISPR protein to maximize target knockdown efficacy.
This R software scores guide RNAs for the RNA-targeting CRISPR protein Cas13d to maximize target knockdown efficacy.
## Background
......@@ -25,22 +25,22 @@ cat Install.txt
<br>
## Example
## Example: Cas13 guide RNAs to target the SARS-CoV-2 RNA genome
In the follwoing section, I demonstrate how to predict guide RNA scores for custom target RNAs.
As an example I choose to score guide RNAs to target the [Corona virus strain USA/NY1-PV08001/2020](https://nextstrain.org/ncov?c=location&f_division=New%20York&r=country).
This strain represents a close relative to the strain responsible for the recent Wuhan corona virus outbreak, bearing 3 nucleotide substitutions (G3243A, C25214T, G29027T) and two amino acid mutations (N: A252S, ORF1a: G993S).
As an example I choose to score guide RNAs to target the [Coronavirus SARS-CoV-2 strain USA/NY1-PV08001/2020](https://nextstrain.org/ncov?c=location&f_division=New%20York&r=country).
This strain represents a close relative to the strain responsible for the recent [coronavirus pandemic](https://www.who.int/dg/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020), bearing 3 nucleotide substitutions (G3243A, C25214T, G29027T) and two amino acid mutations (N: A252S, ORF1a: G993S).
The positive-sense RNA virus contains 10 genes (ORF1ab, S, ORF3a, E, M, ORF6, ORF7a, ORF8, N, ORF10).
To predict guide RNA scores for these corona virus genes, first change directory into the Cas13design folder.
The data directory contains the corona virus sequences separated into single entry FASTA files.
To predict guide RNA scores for genes in the SARS-CoV-2 RNA genome, first change directories into the Cas13design folder.
The data directory contains the SARS-CoV-2 RNA genome sequences separated in single entry FASTA files.
```r
# navigate to your Cas13design installation
cd ~/path/to/Cas13design
# find USA/NY1-PV08001/2020 corona virus sequences
# find USA/NY1-PV08001/2020 coronavirus sequences
ls -ltr ./data/*fasta
```
......@@ -50,10 +50,10 @@ ls -ltr ./data/*fasta
### Predict guide RNA scores
Next, simply execute RfxCas13d_GuideScoring.R by providing 3 required input arguments:
1. the target sequence as a single entry fasta file
Next, run the RfxCas13d_GuideScoring.R script by providing 3 required input arguments:
1. the target sequence as a single entry FASTA file
2. the model input data
3. a true or false statement, if you would like the predctions to be plotted relative to the input sequence.
3. a boolean variable (true/false), if you would like the predctions to be plotted relative to the input sequence.
```r
......@@ -63,10 +63,10 @@ Rscript ./scripts/RfxCas13d_GuideScoring.R ./data/MN908947_NY1-PV08001.S.fasta
```
If you run the script the first time, R package installation may take several minutes.
Once Cas13design is fully installed, the run time scales with the fasta input length.
Our software extracts all information needed from the target input sequence, including base probabilities, RNA-RNA hybridization energies, RNA target site accessibility or guide RNA folding.
The minimum length supplied must be at least 30 nucleotides. The total run time for the provided ~1000nt test.fa example is about 2 min.
If you run the script the first time, the R package installation may take several minutes to install needed packages.
Once Cas13design is fully installed, the run time scales with the FASTA input length.
The software extracts all information needed from the target sequence, including base probabilities, RNA-RNA hybridization energies, RNA target site accessibility or guide RNA folding.
The minimum length for a target sequence is 30 nucleotides. The total run time for the provided ~1000 nt test.fa example is about 2 min.
<br>
<br>
<br>
......@@ -74,17 +74,17 @@ The minimum length supplied must be at least 30 nucleotides. The total run time
### Output
The output may be 3-fold and named after the fasta header information:
The output produced will have a few different elements and will be named named with the FASTA header information:
1. A fasta file with guide sequences (reverse complement to the target sequence).
The header includes the following information separated by underscores "_"
the crRNA number (5' to 3') and match position (e.g. crRNA1156:1399-1425)
standardized guide score
rank
quartile according to input screens
quartile
2. A csv file containing all predicted guides
3. if plot was set to TRUE, a pdf file is returned depicting the score destribution along the target transcripts
3. if plot was set to TRUE, a pdf file is generated depicting the score destribution along the target transcripts
Guide scores range between a 0 - 1 interval, with higher scores being indicative for higher predicted knock-down efficacy.
Guide scores range between a 0 - 1 interval, with **higher scores** being indicative for **higher predicted knock-down** efficacy.
| GuideName | Sequence | Position |Score | Rank | Standardized score | Quartile |
......@@ -106,7 +106,7 @@ Guide scores range between a 0 - 1 interval, with higher scores being indicative
## Bulk predictions
To predict guide RNA scores for all corona virus genes, one can use a simple wrapper. In this case, all jobs are send as individual jobs to a computation cluster.
To predict guide RNA scores for all SARS-CoV-2 genes, one can use a simple wrapper. In this case, all jobs are send as individual jobs to a computation cluster.
```sh
# This will submit one job per fasta file in the directory ./data/
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment