Skip to content
Update Scripts Descriptions authored by Alex Trouern-Trend's avatar Alex Trouern-Trend
......@@ -3,15 +3,15 @@ Plant Computational Genomics Lab @ UConn
### Processing scripts
The scripts used in the processing of the supporting genomic and transcriptomic data are available in the appropriate subdirectories of the [processing repository](https://gitlab.com/PlantGenomicsLab/annotationtool/tree/master/process). These are kept in groups based on processing step. [This page](https://gitlab.com/PlantGenomicsLab/annotationtool/wikis/Scripts-Descriptions) (here) is an atlas of script function, listed alphabetically by script name.
Name | Purpose | Input | Expected Output | Notes
---- | ------- | ----- | --------------- | -----
fetchSRA.sh | Gather data from multiple SRR runs from ncbi | Text file containing the specific SRR runs that you want to download | raw reads in .fastq format | the option `--split-files` in line 22 must be removed if gathered data is not paired
validateSRA.sh | To validate that the correct files have been downloaded from ncbi | The same input file you ran for fetchSRA | an output and error file assuring that all runs were downloaded correctly |
filtersubmit.sh | run filterLen.py with the correct module and input settings | genome of interest | genome filtered at X bp per scaffold |
repeatModeler.sh | Create a de novo repeats library for a genome | filtered genome in fasta format | repeats library entitled conseni.fa.classified |
splitfasta.sh | Reduce the running time for repeatmasker by splitting the genome into pieces | filtered genome in fasta format | multiple pieces of filtered genome |
repeatmasker.sh | Softmask the regions in the genome recognized as repetitive | a piece of the filtered genome | softmasked piece of genome |
concat.sh | Recombine all of the softmasked pieces of the genome | all sm pieces of the genome | One recombined softmasked genome |
Name | Step | Purpose | Input | Expected Output | Notes
---- | ---- | ------- | ----- | --------------- | -----
fetchSRA.sh | gather | Gather data from multiple SRR runs from ncbi | Text file containing the specific SRR runs that you want to download | raw reads in .fastq format | the option `--split-files` in line 22 must be removed if gathered data is not paired
validateSRA.sh | gather | To validate that the correct files have been downloaded from ncbi | The same input file you ran for fetchSRA | an output and error file assuring that all runs were downloaded correctly |
filtersubmit.sh | filter | run filterLen.py with the correct module and input settings | genome of interest | genome filtered at X bp per scaffold |
repeatModeler.sh | mask | Create a de novo repeats library for a genome | filtered genome in fasta format | repeats library entitled conseni.fa.classified |
splitfasta.sh | mask | Reduce the running time for repeatmasker by splitting the genome into pieces | filtered genome in fasta format | multiple pieces of filtered genome |
repeatmasker.sh | mask | Softmask the regions in the genome recognized as repetitive | a piece of the filtered genome | softmasked piece of genome |
concat.sh | mask | Recombine all of the softmasked pieces of the genome | all sm pieces of the genome | One recombined softmasked genome |
......
......