Commit 0c4fede6 authored by harm's avatar harm
Browse files

add readme

parent 5481a36c
# Cas13design v0.2
This R software scores guide RNA for the RNA-targeting RfxCas13d CRISPR protein to maximize target knockdown efficacy.
## Background
The guide RNA efficacies are modeled using data from GFP, CD46, CD55, and CD71 mRNA tiling screens.
For more information, please refer to our recent manuscript:
Hans-Hermann Wessels\*, Alejandro Méndez-Mancilla\*, Xinyi Guo, Mateusz Legut, Zharko Daniloski, Neville E. Sanjana.
[***Massively parallel Cas13 screens reveal principles for guide RNA design***.](https://doi.org/10.1038/s41587-020-0456-9) Nature Biotechnology (2020)
## Install
To download and install the Cas13 guide RNA scoring software, please follow the instructions in Install.txt.
```
wget https://gitlab.com/sanjanalab/cas13/raw/master/Cas13design.tar.gz
tar -xzf Cas13design.tar.gz
cd Cas13design
cat Install.txt
```
## Example
In the follwoing section, I demonstrate how to predict guide RNA scores for custom target RNAs.
As an example I choose to score guide RNAs to target the [Corona virus strain USA/NY1-PV08001/2020](https://nextstrain.org/ncov?c=location&f_division=New%20York&r=country).
This strain represents a close relative to the strain responsible for the recent Wuhan corona virus outbreak, bearing 3 nucleotide substitutions (G3243A, C25214T, G29027T) and two amino acid mutations (N: A252S, ORF1a: G993S).
The positive-sense RNA virus contains 10 genes (ORF1ab, S, ORF3a, E, M, ORF6, ORF7a, ORF8, N, ORF10).
To predict guide RNA scores for these corona virus genes, first change directory into the Cas13design folder.
The data directory contains the corona virus sequences separated into single entry FASTA files.
```r
# navigate to your Cas13design installation
cd ~/path/to/Cas13design
# find USA/NY1-PV08001/2020 corona virus sequences
ls -ltr ./data/*fasta
```
### Predict guide RNA scores
Next, simply execute RfxCas13d_GuideScoring.R by providing 3 required input arguments:
1. the target sequence as a single entry fasta file
2. the model input data
3. a true or false statement, if you would like the predctions to be plotted relative to the input sequence.
```r
# Predict guide RNA scores for the USA/NY1-PV08001/2020 S gene
Rscript ./scripts/RfxCas13d_GuideScoring.R ./data/MN908947_NY1-PV08001.S.fasta ./data/Cas13designGuidePredictorInput.csv true
```
If you run the script the first time, R package installation may take several minutes.
Once Cas13design is fully installed, the run time scales with the fasta input length.
Our software extracts all information needed from the target input sequence, including base probabilities, RNA-RNA hybridization energies, RNA target site accessibility or guide RNA folding.
The minimum length supplied must be at least 30 nucleotides. The total run time for the provided ~1000nt test.fa example is about 2 min.
### Output
The output may be 3-fold and named after the fasta header information:
1. A fasta file with guide sequences (reverse complement to the target sequence).
The header includes the following information separated by underscores "_"
the crRNA number (5' to 3') and match position (e.g. crRNA1156:1399-1425)
standardized guide score
rank
quartile according to input screens
2. A csv file containing all predicted guides
3. if plot was set to TRUE, a pdf file is returned depicting the score destribution along the target transcripts
Guide scores range between a 0 - 1 interval, with higher scores being indicative for higher predicted knock-down efficacy.
| GuideName | Sequence | Position |Score | Rank | Standardized score | Quartile |
| -----------------|------------------------|-----------|------ |--------| :-----------------:| -------- |
| crRNA0742:780-802| CCACATAATAAGCTGCAGCACCA| 802 |1.659 | 0.9997 | 1 |4 |
| crRNA0741:779-801| CACATAATAAGCTGCAGCACCAG| 801 |1.578 | 0.9994 | 1 |4 |
| ... | ... |... |... | ... |... |... |
Guide RNA prediction visualization:
![alt text][predictions]
[predictions]: https://gitlab.com/sanjanalab/cas13/-/blob/master/Cas13designGuidePredictor/data/ExampleS.png "USA/NY1-PV08001/2020 S gene"
![alt text](./data/ExampleS.png "USA/NY1-PV08001/2020 S gene")
### Bulk predictions
To predict guide RNA scores for all corona virus genes, one can use a simple wrapper. In this case, all jobs are send as individual jobs to a computation cluster.
```sh
# This will submit one job per fasta file in the directory ./data/
bash qsub_MakePredictions.sh ./data/
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment